一种基于熵的文本相似性计算方法

TP391.1; 文本比较是求解两个文本间相似度的过程,文本间的相似度越高代表两个文本越趋于类似。传统的相似度算法主要从字符的角度度量文本的相似性,忽略了文本内多个共同文本串对于文本相似度的影响。针对此问题提出一种基于熵的相似度求解方法,在对文本间字符信息的提取基础上,建立共同子文本串度量维度,然后采用熵的方法进行相似度度量。实验表明,该方法具有更平滑的相似度曲线,从而验证了算法的有效性和准确性。...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 33; no. 3; pp. 665 - 668
Main Author 李圣文 凌微 龚君芳 周长征
Format Journal Article
LanguageChinese
Published 中国地质大学 信息工程学院,武汉,430074%国网十堰供电公司,湖北 十堰,442000 2016
Subjects
Online AccessGet full text
ISSN1001-3695
DOI10.3969/j.issn.1001-3695.2016.03.006

Cover

Loading…
Abstract TP391.1; 文本比较是求解两个文本间相似度的过程,文本间的相似度越高代表两个文本越趋于类似。传统的相似度算法主要从字符的角度度量文本的相似性,忽略了文本内多个共同文本串对于文本相似度的影响。针对此问题提出一种基于熵的相似度求解方法,在对文本间字符信息的提取基础上,建立共同子文本串度量维度,然后采用熵的方法进行相似度度量。实验表明,该方法具有更平滑的相似度曲线,从而验证了算法的有效性和准确性。
AbstractList TP391.1; 文本比较是求解两个文本间相似度的过程,文本间的相似度越高代表两个文本越趋于类似。传统的相似度算法主要从字符的角度度量文本的相似性,忽略了文本内多个共同文本串对于文本相似度的影响。针对此问题提出一种基于熵的相似度求解方法,在对文本间字符信息的提取基础上,建立共同子文本串度量维度,然后采用熵的方法进行相似度度量。实验表明,该方法具有更平滑的相似度曲线,从而验证了算法的有效性和准确性。
文本比较是求解两个文本间相似度的过程,文本间的相似度越高代表两个文本越趋于类似。传统的相似度算法主要从字符的角度度量文本的相似性,忽略了文本内多个共同文本串对于文本相似度的影响。针对此问题提出一种基于熵的相似度求解方法,在对文本间字符信息的提取基础上,建立共同子文本串度量维度,然后采用熵的方法进行相似度度量。实验表明,该方法具有更平滑的相似度曲线,从而验证了算法的有效性和准确性。
Abstract_FL Text comparison is the process to find similarity between the two texts,the higher similarity between the texts show the two texts tend to like.The traditional method was from the perspective of the similarity measure characters of the text,ig-nored the text similarity factor of the plural common text string within the text.To address this problem,this paper proposed a text-similarity method based on entropy.The method tried to extract common strings from texts,then established a common sub-measure dimensions,and calculated the similarity based on entropy.Experiments show that the method has a smoother similarity curve,so the algorithm is effective and accuracy.
Author 李圣文 凌微 龚君芳 周长征
AuthorAffiliation 中国地质大学信息工程学院,武汉430074 国网十堰供电公司,湖北十堰442000
AuthorAffiliation_xml – name: 中国地质大学 信息工程学院,武汉,430074%国网十堰供电公司,湖北 十堰,442000
Author_FL Ling Wei
Zhou Changzheng
Gong Junfang
Li Shengwen
Author_FL_xml – sequence: 1
  fullname: Li Shengwen
– sequence: 2
  fullname: Ling Wei
– sequence: 3
  fullname: Gong Junfang
– sequence: 4
  fullname: Zhou Changzheng
Author_xml – sequence: 1
  fullname: 李圣文 凌微 龚君芳 周长征
BookMark eNo9jz9Lw0AYxm-oYFv9EuLgkvheLnnTG6X4Dwou3cPd9VIT9KIJItk6iE66WUEFceoiXUSopR8nJn4MIxWnBx5-PA-_FmmYxGhCNinYjCPfju0oy4xNAajFkHu2AxRtYDYANkjzv18lrSyLAVyHcmgSq5iNqsnt18u8mN9V1x_V41U5vimf36qnWbFYlKPJ9_S1mj6U48_y_X6NrITiJNPrf9km_b3dfvfA6h3tH3Z3epZCQEs5fohc6pAOtPCAUwcpZQwk-tJl7kAorqWrtVYqVL6WHXSd0JMKmPS0LxRrk63l7KUwoTDDIE4uUlMfBnEW53ke_7oBq81qdGOJquPEDM-jGj5Lo1OR5gGiz70Ogs9-AA84ZQQ
ClassificationCodes TP391.1
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2RA
92L
CQIGP
W92
~WA
2B.
4A8
92I
93N
PSX
TCJ
DOI 10.3969/j.issn.1001-3695.2016.03.006
DatabaseName 维普_期刊
中文科技期刊数据库-CALIS站点
维普中文期刊数据库
中文科技期刊数据库-工程技术
中文科技期刊数据库- 镜像站点
Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
DocumentTitleAlternate Text-similarity method based on entropy
DocumentTitle_FL Text-similarity method based on entropy
EndPage 668
ExternalDocumentID jsjyyyj201603006
667958607
GrantInformation_xml – fundername: 国家自然科学基金资助项目; 中国地质大学(武汉)中央高校基本科研业务费专项资金资助项目
  funderid: (61272470); (2012119039,2012119145)
GroupedDBID -0Y
2B.
2C0
2RA
5XA
5XJ
92H
92I
92L
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CQIGP
CUBFJ
CW9
TCJ
TGT
U1G
U5S
W92
~WA
4A8
93N
ABJNI
PSX
ID FETCH-LOGICAL-c606-c27f69bef1dea50912611330b67b434dac9eb4eeeccfc7eb8642f5bc03b5e7ac3
ISSN 1001-3695
IngestDate Thu May 29 03:54:50 EDT 2025
Wed Feb 14 10:24:40 EST 2024
IsPeerReviewed false
IsScholarly true
Issue 3
Keywords Levenshtein distance algorithm
编辑距离算法
text similarity
string match
字符串匹配
最长公共子序列
longest common sequence
文本相似性
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c606-c27f69bef1dea50912611330b67b434dac9eb4eeeccfc7eb8642f5bc03b5e7ac3
Notes 51-1196/TP
Li Shengwen,Ling Wei,Gong Junfang,Zhou Changzheng(1. School of Information Engineering, China University of Geoscie~es, Wuhan 430074, China; 2. State Grid Shiyan Eleetic Power Company, Shiyan Hubei 442000, China)
Text comparison is the process to find similarity between the two texts,the higher similarity between the texts show the two texts tend to like. The traditional method was from the perspective of the similarity measure characters of the text,ignored the text similarity factor of the plural common text string within the text. To address this problem,this paper proposed a text- similarity method based on entropy. The method tried to extract common strings from texts,then established a common sub-measure dimensions,and calculated the similarity based on entropy. Experiments show that the method has a smoother similarity curve,so the algorithm is effective and accuracy.
text similarity; string match; Levenshtein distance algorithm; longest common sequence
PageCount 4
ParticipantIDs wanfang_journals_jsjyyyj201603006
chongqing_primary_667958607
PublicationCentury 2000
PublicationDate 2016
PublicationDateYYYYMMDD 2016-01-01
PublicationDate_xml – year: 2016
  text: 2016
PublicationDecade 2010
PublicationTitle 计算机应用研究
PublicationTitleAlternate Application Research of Computers
PublicationYear 2016
Publisher 中国地质大学 信息工程学院,武汉,430074%国网十堰供电公司,湖北 十堰,442000
Publisher_xml – name: 中国地质大学 信息工程学院,武汉,430074%国网十堰供电公司,湖北 十堰,442000
SSID ssj0042190
ssib001102940
ssib002263599
ssib023646305
ssib051375744
ssib025702191
Score 2.0208395
Snippet ...
TP391.1;...
SourceID wanfang
chongqing
SourceType Aggregation Database
Publisher
StartPage 665
SubjectTerms 字符串匹配
文本相似性
最长公共子序列
编辑距离算法
Title 一种基于熵的文本相似性计算方法
URI http://lib.cqvip.com/qk/93231X/201603/667958607.html
https://d.wanfangdata.com.cn/periodical/jsjyyyj201603006
Volume 33
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1NaxNBdAgtiBe_xVqVCp2TbM3uzM7HcTfZUDx4itBb2NnstuSQVpse0lMPoie9WUEF8dSL9CJCW_pzYuLP8L3ZyTYULSqEYXgz896bfZv33szOvEfIciqCNE994zEumMdNl3tpGmb49RH9ZzBguY32-VSsPuNP1sK1Wu145tTSzsCsZLu_vVfyP1IFGMgVb8n-g2QrpACAOsgXSpAwlH8lY5pwGis8rJBIGkmqmjQJqW7ROLJNEVUJNilB4xArGiCcJoJqQZW0lQaNGrYpRlQ4qoE_aAK0gDNRNEpo5FsSCdXSDY81VmJGy-yVUwf3D_0blqUQS80tOU4jZfvUHSTSNK62Ce2gpmU_tDyyiutHCALmVcMiBFoJgDTWtKWh6zgZ4ENFwJ_trn1LTQOzNG65cWUSl-mWR3kX0-lnPAHGhJuZU-BlJA33orIZbSzKNBTOsIsyf895m8G00NZmIIGVigCe-hNl_Ntzobqt8RdC6lAJjGMwH0jpg0Kdj-Jm3DrzRcF1m41NGGDYn7O1HwbuFzPKFrMJgvWolG3oMxna1ASlW8GhsQyt4Ri8RJYd948v4h1jhmxs9tefgydkL6b1i7S_PuNDta-RK27xsxSVb_J1UtvduEGuThOLLDk7c5N4o6O9ycGbH59PRidvJ6--Tz68HO-_Hn_6Ovl4NDo9He8d_Dz8Mjl8P94_Hn97d4u0W0m7seq5vB5eBstlLwtkIbTJC7-bp-ivwiLeZ6xuhDSc8W6a6dzwPAflUmQyNwqWyEVosjozYS7TjN0mc_3Nfn6HLHVzERhjMtU1hkuDt8ALGWAQRlUov5ALZLGafGerDN_SqUS3QB66x9Fxf-rtTm-7NxwOe4FNvw6P7-6FGBbJZexZbsndI3ODFzv5fXBSB-aBex1-AV4PbcI
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E4%B8%80%E7%A7%8D%E5%9F%BA%E4%BA%8E%E7%86%B5%E7%9A%84%E6%96%87%E6%9C%AC%E7%9B%B8%E4%BC%BC%E6%80%A7%E8%AE%A1%E7%AE%97%E6%96%B9%E6%B3%95&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%BA%94%E7%94%A8%E7%A0%94%E7%A9%B6&rft.au=%E6%9D%8E%E5%9C%A3%E6%96%87+%E5%87%8C%E5%BE%AE+%E9%BE%9A%E5%90%9B%E8%8A%B3+%E5%91%A8%E9%95%BF%E5%BE%81&rft.date=2016&rft.issn=1001-3695&rft.volume=33&rft.issue=3&rft.spage=665&rft.epage=668&rft_id=info:doi/10.3969%2Fj.issn.1001-3695.2016.03.006&rft.externalDocID=667958607
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F93231X%2F93231X.jpg
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjyyyj%2Fjsjyyyj.jpg