Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Understanding web documents: finding pagelets for transformation using structural patterns

Understanding web documents: finding pagelets for transformation using structural patterns Understanding a web document and the sections inside the document is very important for web transformation and information retrieval from web pages. Detecting pagelets, which are small features located inside a web page, in order to understand a web document's structure is a difficult problem. Current work on pagelet detection focuses only on finding the location of the pagelet without regard to its functionality. We describe a method to detect both the location and functionality of pagelets using HTML element patterns. For each pagelet type, an HTML element pattern is created and matched to a web page. Sections of the web page that matches the patterns are marked as pagelet candidates. We test this technique on multiple popular web pages from the news and e-commerce genres. We find that this method adequately recalls various pagelets from the web page. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Engineering and Technology Inderscience Publishers

Understanding web documents: finding pagelets for transformation using structural patterns

Loading next page...
 
/lp/inderscience-publishers/understanding-web-documents-finding-pagelets-for-transformation-using-ZQeBMdnjnn
Publisher
Inderscience Publishers
Copyright
Copyright © Inderscience Enterprises Ltd. All rights reserved
ISSN
1476-1289
eISSN
1741-9212
DOI
10.1504/IJWET.2008.019537
Publisher site
See Article on Publisher Site

Abstract

Understanding a web document and the sections inside the document is very important for web transformation and information retrieval from web pages. Detecting pagelets, which are small features located inside a web page, in order to understand a web document's structure is a difficult problem. Current work on pagelet detection focuses only on finding the location of the pagelet without regard to its functionality. We describe a method to detect both the location and functionality of pagelets using HTML element patterns. For each pagelet type, an HTML element pattern is created and matched to a web page. Sections of the web page that matches the patterns are marked as pagelet candidates. We test this technique on multiple popular web pages from the news and e-commerce genres. We find that this method adequately recalls various pagelets from the web page.

Journal

International Journal of Web Engineering and TechnologyInderscience Publishers

Published: Jan 1, 2008

There are no references for this article.