Building Web Page Logical Structure Model towards Effective Metadata Extraction
Web pages are typical semi-structure data. Some tree-based models have been proposed to describe the semantic content structure of web pages in order to facilitate further content analysis. However, most existing models only present the segmentation hierarchy of content blocks rather than the semant...
Saved in:
Published in | 2010 12th International Asia-Pacific Web Conference p. 401 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.04.2010
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Web pages are typical semi-structure data. Some tree-based models have been proposed to describe the semantic content structure of web pages in order to facilitate further content analysis. However, most existing models only present the segmentation hierarchy of content blocks rather than the semantic relationships among them. In this work, we propose a novel web page semantic structure model, called Logical Structure Model. It can present more comprehensive structure information of web pages. Based on this model, the hidden patterns in web content can be revealed easier. The proposed model has been used to facilitate identifying course metadata in our Online Course Organization project, which aims to build an online course portal to serve the course information obtained from the Web. |
---|---|
ISBN: | 9781769540122 9781424465996 1424465990 |
DOI: | 10.1109/APWeb.2010.81 |