Web content extraction based on subject detection and node density

Currently, very large data have been transferred from everywhere through World Wide Web. Consequently, the information extraction systems have been arising and many researches have been focusing on those data for utilizing them. These systems are very useful for data pre-processing and cleaning for...

Full description

Saved in:
Bibliographic Details
Published in2015 7th International Conference on Knowledge and Smart Technology (KST) pp. 121 - 125
Main Authors Petprasit, Warid, Jaiyen, Saichon
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.01.2015
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Currently, very large data have been transferred from everywhere through World Wide Web. Consequently, the information extraction systems have been arising and many researches have been focusing on those data for utilizing them. These systems are very useful for data pre-processing and cleaning for real-time applications. Moreover, these systems can make other analyzing systems to analyze the data in real time such as social network mining, web mining, data mining, or even special tasks such as false advertisement detection, demand forecasting, and comment extraction on product and service reviews. In this paper, we focus on extracting the content data of web pages in e-commerce web sites based on subject detection and node density. In the experimental results, it can signify that our proposed method is appropriated to extract the data rich region in data-intensive pages in an automatic fashion.
AbstractList Currently, very large data have been transferred from everywhere through World Wide Web. Consequently, the information extraction systems have been arising and many researches have been focusing on those data for utilizing them. These systems are very useful for data pre-processing and cleaning for real-time applications. Moreover, these systems can make other analyzing systems to analyze the data in real time such as social network mining, web mining, data mining, or even special tasks such as false advertisement detection, demand forecasting, and comment extraction on product and service reviews. In this paper, we focus on extracting the content data of web pages in e-commerce web sites based on subject detection and node density. In the experimental results, it can signify that our proposed method is appropriated to extract the data rich region in data-intensive pages in an automatic fashion.
Author Jaiyen, Saichon
Petprasit, Warid
Author_xml – sequence: 1
  givenname: Warid
  surname: Petprasit
  fullname: Petprasit, Warid
  email: s5650804@kmitl.ac.th
  organization: Dept. of Comput. Sci., King Mongkut's Inst. of Technol. Ladkrabang, Bangkok, Thailand
– sequence: 2
  givenname: Saichon
  surname: Jaiyen
  fullname: Jaiyen, Saichon
  email: kjsaicho@kmitl.ac.th
  organization: Dept. of Comput. Sci., King Mongkut's Inst. of Technol. Ladkrabang, Bangkok, Thailand
BookMark eNo1j0FLAzEUhCPqwdbeBS_5A7u-bPI2m6MWrWLBgxWP5SV5CxHNym4E--8ttJ5m5hsYmJk4y0NmIa4U1EqBu3l-3dQNKKwtoDKIJ2KmjHWuBePsqVg42_3nzlyIu3f2Mgy5cC6Sf8tIoaQhS08TR7k304__4FBk5MKHinKUeYi8R3lKZXcpznv6nHhx1Ll4e7jfLB-r9cvqaXm7rpJqdKla0zjCAETIHE3AXkXbA7aROo0OyUNgHcBr3XkTewCvWLP15FvLptVzcX3YTcy8_R7TF4277fGm_gPkv0o3
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/KST.2015.7051455
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1479960497
9781479960491
EndPage 125
ExternalDocumentID 7051455
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i123t-6429a5c0aa5eed4c5f1d7f056da83595ab0ce3c0b338b4df00b1e3e7bab67e463
IEDL.DBID RIE
ISBN 9781479960484
1479960489
IngestDate Wed Jun 26 19:21:01 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i123t-6429a5c0aa5eed4c5f1d7f056da83595ab0ce3c0b338b4df00b1e3e7bab67e463
PageCount 5
ParticipantIDs ieee_primary_7051455
PublicationCentury 2000
PublicationDate 20150101
PublicationDateYYYYMMDD 2015-01-01
PublicationDate_xml – month: 01
  year: 2015
  text: 20150101
  day: 01
PublicationDecade 2010
PublicationTitle 2015 7th International Conference on Knowledge and Smart Technology (KST)
PublicationTitleAbbrev KST
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.5708162
Snippet Currently, very large data have been transferred from everywhere through World Wide Web. Consequently, the information extraction systems have been arising and...
SourceID ieee
SourceType Publisher
StartPage 121
SubjectTerms Cascading style sheets
data intensive
Data mining
e-commerce
node density (SDND)
subject detection
Uniform resource locators
web content extraction
web information extraction
web mining
Web pages
wrapper induction
XML
Title Web content extraction based on subject detection and node density
URI https://ieeexplore.ieee.org/document/7051455
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEB7anjyptOKbHDy6222z2cdVsRRFEWyxt5JJZkGEXam7F3-9k33UBx6EPWQTdpMwkC8z-eYLwIWWDFPTJPJSdKLaMiNPxwY9jQE_ViYZ1gTZh2i-DG9XatWDy20uDBHV5DPyXbE-y7eFqVyobBw7sW6l-tBnx-1brlZcS4wkaSfh1L6H3bFkkI7vnhaOx6X89h8_LlOpsWS2C_fdKBoKyatfleibj18Cjf8d5h6MvrL2xOMWj_ahR_kQrp4JheOj8yeCF-JNk8ggHHpZwYX3Cl0oRlgqqWnSuRV5YYmrcsfYGMFydrO4nnvtvQneC-NQ6bFLkWplAq0V9xgalU1snPFOx-rE5eGyEQxJEyC7pxjaLAhwQpJi1GwvCiN5AIO8yOkQhDESNTtFUqqIPcEoNdaYGLOpxJB493UEQzf_9VsjjbFup378d_UJ7DgbNBGMUxiUm4rOGNNLPK-N-QmYzKBG
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VMsAEqEW8ycBI0rSO81hBVIU-hEQrulU--yIhpBSVZOHXc07S8hADUgbHVmJbJ_nznb_7DHClBMNULw7dBK2otkjJVZFGV6HPjxFxiiVBdhIOZsHDXM4bcL3JhSGiknxGni2WZ_lmqQsbKutEVqxbyi3YZtyX3W_ZWlEpMhInaxGn-j1YH0z6SWf4NLVMLunVf_lxnUqJJv09GK_HUZFIXr0iR09__JJo_O9A96H9lbfnPG4Q6QAalLXg5pnQsYx0_sThpXhVpTI4Fr-Mw4X3Am0wxjGUU9WkMuNkS0NclVnORhtm_bvp7cCtb05wXxiJcpedikRJ7SslucdAy7RropT3OkbFNhOXzaBJaB_ZQcXApL6PXRIUoWKLURCKQ2hmy4yOwNFaoGK3SAgZsi8YJtpoHWHaExgQ77-OoWXnv3irxDEW9dRP_q6-hJ3BdDxajO4nw1PYtfao4hln0MxXBZ0zwud4URr2Eynmo48
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+7th+International+Conference+on+Knowledge+and+Smart+Technology+%28KST%29&rft.atitle=Web+content+extraction+based+on+subject+detection+and+node+density&rft.au=Petprasit%2C+Warid&rft.au=Jaiyen%2C+Saichon&rft.date=2015-01-01&rft.pub=IEEE&rft.isbn=9781479960484&rft.spage=121&rft.epage=125&rft_id=info:doi/10.1109%2FKST.2015.7051455&rft.externalDocID=7051455
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781479960484/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781479960484/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781479960484/sc.gif&client=summon&freeimage=true