An Improved HITS Algorithm Based on Analysis of Web Page Links and Web Content Similarity

HITS (HyperLink-Induced Topic Search) is a classical link analysis algorithm for analyzing WSM (Web Structure Mining). The algorithm takes into consideration of the structural information of links but ignores the correlation between pages and topics. In some cases, the problem of "topic drift&q...

Full description

Saved in:
Bibliographic Details
Published in2016 International Conference on Cyberworlds (CW) pp. 147 - 150
Main Author Weiming Yang
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2016
Subjects
Online AccessGet full text

Cover

Loading…
Abstract HITS (HyperLink-Induced Topic Search) is a classical link analysis algorithm for analyzing WSM (Web Structure Mining). The algorithm takes into consideration of the structural information of links but ignores the correlation between pages and topics. In some cases, the problem of "topic drift"-a deviation between search and topic-would appear. For this purpose, the current paper presents an improved algorithm, by taking into account both of the web content similarity and link analysis. Our experiment shows that the improved algorithm has enhanced the correlation of search results and limited the occurrence of topic drift to some degree.
AbstractList HITS (HyperLink-Induced Topic Search) is a classical link analysis algorithm for analyzing WSM (Web Structure Mining). The algorithm takes into consideration of the structural information of links but ignores the correlation between pages and topics. In some cases, the problem of "topic drift"-a deviation between search and topic-would appear. For this purpose, the current paper presents an improved algorithm, by taking into account both of the web content similarity and link analysis. Our experiment shows that the improved algorithm has enhanced the correlation of search results and limited the occurrence of topic drift to some degree.
Author Weiming Yang
Author_xml – sequence: 1
  surname: Weiming Yang
  fullname: Weiming Yang
  email: ywm519@163.com
  organization: Coll. of Comput. & Inf. Sci., Chongqing Normal Univ., Chongqing, China
BookMark eNotjE9LwzAcQCO4g85dvHrJF2j95V_bHGtRVygobGN4Gr90yQy2yWiL0G_vUE8PHrx3S65DDJaQewYpY6Afq33KgWWpgCuy0nnBFGjgAoS6IR9loHV_HuK3PdJ1vd3QsjvFwU-fPX3C8SJjoGXAbh79SKOje2voO54sbXz4GimG46-qYphsmOjG977DSz_fkYXDbrSrfy7J7uV5W62T5u21rsom8SxXU4JMWWO4RIWFbI3mjgkmBEfZZhqUQ845ZoZD63gmFDippXGtc5a3BgopluTh7-uttYfz4Hsc5kOeq4xJJn4AeQBMkg
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CW.2016.30
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781509023035
1509023038
EndPage 150
ExternalDocumentID 7756141
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i175t-a15ebb24a5a84cb92f131332a4c6905fa222a6b20cf26350f494bfcffe2cb0843
IEDL.DBID RIE
IngestDate Thu Jun 29 18:38:03 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-a15ebb24a5a84cb92f131332a4c6905fa222a6b20cf26350f494bfcffe2cb0843
PageCount 4
ParticipantIDs ieee_primary_7756141
PublicationCentury 2000
PublicationDate 2016-Sept.
PublicationDateYYYYMMDD 2016-09-01
PublicationDate_xml – month: 09
  year: 2016
  text: 2016-Sept.
PublicationDecade 2010
PublicationTitle 2016 International Conference on Cyberworlds (CW)
PublicationTitleAbbrev CYBER
PublicationYear 2016
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.651776
Snippet HITS (HyperLink-Induced Topic Search) is a classical link analysis algorithm for analyzing WSM (Web Structure Mining). The algorithm takes into consideration...
SourceID ieee
SourceType Publisher
StartPage 147
SubjectTerms Algorithm design and analysis
Authority page
Computational efficiency
Computers
Correlation
Crawlers
HITS algorithm
Hub page
Symmetric matrices
Web content similarity
Web pages
Title An Improved HITS Algorithm Based on Analysis of Web Page Links and Web Content Similarity
URI https://ieeexplore.ieee.org/document/7756141
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1LTwIxFIUbZOVKDRjf6cKlHaalnccSiQRNMCRAwBXpU4nSITps_PW2HR6JceGu6aaT3mTuPe25XwG4zdqaCKIw0kpQRHnKUEbbCcIyzyTPiWLhxnTwnPQn9GnGZjVwt-uF0VoH85mO_DDc5atCrv1RWStNPbfSaZ0DJ9yqXq0NcRTHeas79U6tJAqG5v1LKSFR9I7AYLtE5Q95j9aliOT3L_rif7_hGDT3LXlwuEs2J6CmbQO8dCysTgW0gv3H8Qh2Pl4LJ_fflvDepScFCwu32BFYGDjVAg7dHwR6CfoFuVVhKiCqbAlHi-XCKV1XmDfBpPcw7vbR5q0EtHAFQIk4ZloIQjnjGZUiJwa3nfwknEqnf5nhrg7giSCxNB4_ExuaU2GkMZpIEbvwnIK6Law-AzDOjaKcigxLTytMs4SlignlWfYyw-k5aPhtma8qHMZ8syMXf09fgkMflcqWdQXq5edaX7s8XoqbEMAfKQOe7A
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV3LTgIxFG0QF7pSA8a3XehyYKa081i4QJSAAiEBAq6wTyXKjIEhRr_FX_HfbGd4JMYtibumi6Y3t-npac89BeDCL0nEkHAsKRi2MPWI5eOSazk88DkNkCDJi2mz5dZ6-G5ABhnwtayFkVIm4jNZMM3kLV9EfGauyoqeZ3wrnbmE8l5-vGuCNr2q3-hsXiJUve1Watb8DwFrpIExtqhDJGMIU0J9zFmAlFPStAxRzDUvJIpqfKQuQzZXxpbFVjjATHGlJOLM1tPW426ATX3OICitDpt7nDp2UKz0jTbMLSQS6tXfLAk0VXfA9yKoVJHyUpjFrMA_f_k9_teod0F-VXQI20s43QMZGebAQzmE6b2HFLBW73Zg-fUpmozi5zG81gAsYBTChbEKjBTsSwbbeo-EhmRPIQ1F0pWYcIUx7IzGI83lNfXIg95aYtoH2TAK5QGAdqAEppj5Djd-jJ7vEk8QJoxbP_cd7xDkTBqGb6nhx3CegaO_u8_BVq3bbAwb9db9Mdg2KyIVoZ2AbDyZyVN9aonZWbJ4IHhcd95-APml_No
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2016+International+Conference+on+Cyberworlds+%28CW%29&rft.atitle=An+Improved+HITS+Algorithm+Based+on+Analysis+of+Web+Page+Links+and+Web+Content+Similarity&rft.au=Weiming+Yang&rft.date=2016-09-01&rft.pub=IEEE&rft.spage=147&rft.epage=150&rft_id=info:doi/10.1109%2FCW.2016.30&rft.externalDocID=7756141