Web content extraction based on subject detection and node density

Currently, very large data have been transferred from everywhere through World Wide Web. Consequently, the information extraction systems have been arising and many researches have been focusing on those data for utilizing them. These systems are very useful for data pre-processing and cleaning for...

Full description

Saved in:

Bibliographic Details
Published in	2015 7th International Conference on Knowledge and Smart Technology (KST) pp. 121 - 125
Main Authors	Petprasit, Warid, Jaiyen, Saichon
Format	Conference Proceeding
Language	English
Published	IEEE 01.01.2015
Subjects	Cascading style sheets data intensive Data mining e-commerce node density (SDND) subject detection Uniform resource locators web content extraction web information extraction web mining Web pages wrapper induction XML
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Currently, very large data have been transferred from everywhere through World Wide Web. Consequently, the information extraction systems have been arising and many researches have been focusing on those data for utilizing them. These systems are very useful for data pre-processing and cleaning for real-time applications. Moreover, these systems can make other analyzing systems to analyze the data in real time such as social network mining, web mining, data mining, or even special tasks such as false advertisement detection, demand forecasting, and comment extraction on product and service reviews. In this paper, we focus on extracting the content data of web pages in e-commerce web sites based on subject detection and node density. In the experimental results, it can signify that our proposed method is appropriated to extract the data rich region in data-intensive pages in an automatic fashion.
ISBN:	9781479960484 1479960489
DOI:	10.1109/KST.2015.7051455