Building Web Page Logical Structure Model towards Effective Metadata Extraction

Web pages are typical semi-structure data. Some tree-based models have been proposed to describe the semantic content structure of web pages in order to facilitate further content analysis. However, most existing models only present the segmentation hierarchy of content blocks rather than the semant...

Full description

Saved in:
Bibliographic Details
Published in2010 12th International Asia-Pacific Web Conference p. 401
Main Authors Baoyao Zhou, Ming Zhang
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Web pages are typical semi-structure data. Some tree-based models have been proposed to describe the semantic content structure of web pages in order to facilitate further content analysis. However, most existing models only present the segmentation hierarchy of content blocks rather than the semantic relationships among them. In this work, we propose a novel web page semantic structure model, called Logical Structure Model. It can present more comprehensive structure information of web pages. Based on this model, the hidden patterns in web content can be revealed easier. The proposed model has been used to facilitate identifying course metadata in our Online Course Organization project, which aims to build an online course portal to serve the course information obtained from the Web.
ISBN:9781769540122
9781424465996
1424465990
DOI:10.1109/APWeb.2010.81