一种基于熵的文本相似性计算方法
TP391.1; 文本比较是求解两个文本间相似度的过程,文本间的相似度越高代表两个文本越趋于类似。传统的相似度算法主要从字符的角度度量文本的相似性,忽略了文本内多个共同文本串对于文本相似度的影响。针对此问题提出一种基于熵的相似度求解方法,在对文本间字符信息的提取基础上,建立共同子文本串度量维度,然后采用熵的方法进行相似度度量。实验表明,该方法具有更平滑的相似度曲线,从而验证了算法的有效性和准确性。...
Saved in:
Published in | 计算机应用研究 Vol. 33; no. 3; pp. 665 - 668 |
---|---|
Main Author | |
Format | Journal Article |
Language | Chinese |
Published |
中国地质大学 信息工程学院,武汉,430074%国网十堰供电公司,湖北 十堰,442000
2016
|
Subjects | |
Online Access | Get full text |
ISSN | 1001-3695 |
DOI | 10.3969/j.issn.1001-3695.2016.03.006 |
Cover
Loading…
Abstract | TP391.1; 文本比较是求解两个文本间相似度的过程,文本间的相似度越高代表两个文本越趋于类似。传统的相似度算法主要从字符的角度度量文本的相似性,忽略了文本内多个共同文本串对于文本相似度的影响。针对此问题提出一种基于熵的相似度求解方法,在对文本间字符信息的提取基础上,建立共同子文本串度量维度,然后采用熵的方法进行相似度度量。实验表明,该方法具有更平滑的相似度曲线,从而验证了算法的有效性和准确性。 |
---|---|
AbstractList | TP391.1; 文本比较是求解两个文本间相似度的过程,文本间的相似度越高代表两个文本越趋于类似。传统的相似度算法主要从字符的角度度量文本的相似性,忽略了文本内多个共同文本串对于文本相似度的影响。针对此问题提出一种基于熵的相似度求解方法,在对文本间字符信息的提取基础上,建立共同子文本串度量维度,然后采用熵的方法进行相似度度量。实验表明,该方法具有更平滑的相似度曲线,从而验证了算法的有效性和准确性。 文本比较是求解两个文本间相似度的过程,文本间的相似度越高代表两个文本越趋于类似。传统的相似度算法主要从字符的角度度量文本的相似性,忽略了文本内多个共同文本串对于文本相似度的影响。针对此问题提出一种基于熵的相似度求解方法,在对文本间字符信息的提取基础上,建立共同子文本串度量维度,然后采用熵的方法进行相似度度量。实验表明,该方法具有更平滑的相似度曲线,从而验证了算法的有效性和准确性。 |
Abstract_FL | Text comparison is the process to find similarity between the two texts,the higher similarity between the texts show the two texts tend to like.The traditional method was from the perspective of the similarity measure characters of the text,ig-nored the text similarity factor of the plural common text string within the text.To address this problem,this paper proposed a text-similarity method based on entropy.The method tried to extract common strings from texts,then established a common sub-measure dimensions,and calculated the similarity based on entropy.Experiments show that the method has a smoother similarity curve,so the algorithm is effective and accuracy. |
Author | 李圣文 凌微 龚君芳 周长征 |
AuthorAffiliation | 中国地质大学信息工程学院,武汉430074 国网十堰供电公司,湖北十堰442000 |
AuthorAffiliation_xml | – name: 中国地质大学 信息工程学院,武汉,430074%国网十堰供电公司,湖北 十堰,442000 |
Author_FL | Ling Wei Zhou Changzheng Gong Junfang Li Shengwen |
Author_FL_xml | – sequence: 1 fullname: Li Shengwen – sequence: 2 fullname: Ling Wei – sequence: 3 fullname: Gong Junfang – sequence: 4 fullname: Zhou Changzheng |
Author_xml | – sequence: 1 fullname: 李圣文 凌微 龚君芳 周长征 |
BookMark | eNo9jz9Lw0AYxm-oYFv9EuLgkvheLnnTG6X4Dwou3cPd9VIT9KIJItk6iE66WUEFceoiXUSopR8nJn4MIxWnBx5-PA-_FmmYxGhCNinYjCPfju0oy4xNAajFkHu2AxRtYDYANkjzv18lrSyLAVyHcmgSq5iNqsnt18u8mN9V1x_V41U5vimf36qnWbFYlKPJ9_S1mj6U48_y_X6NrITiJNPrf9km_b3dfvfA6h3tH3Z3epZCQEs5fohc6pAOtPCAUwcpZQwk-tJl7kAorqWrtVYqVL6WHXSd0JMKmPS0LxRrk63l7KUwoTDDIE4uUlMfBnEW53ke_7oBq81qdGOJquPEDM-jGj5Lo1OR5gGiz70Ogs9-AA84ZQQ |
ClassificationCodes | TP391.1 |
ContentType | Journal Article |
Copyright | Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
Copyright_xml | – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved. |
DBID | 2RA 92L CQIGP W92 ~WA 2B. 4A8 92I 93N PSX TCJ |
DOI | 10.3969/j.issn.1001-3695.2016.03.006 |
DatabaseName | 维普_期刊 中文科技期刊数据库-CALIS站点 维普中文期刊数据库 中文科技期刊数据库-工程技术 中文科技期刊数据库- 镜像站点 Wanfang Data Journals - Hong Kong WANFANG Data Centre Wanfang Data Journals 万方数据期刊 - 香港版 China Online Journals (COJ) China Online Journals (COJ) |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
DocumentTitleAlternate | Text-similarity method based on entropy |
DocumentTitle_FL | Text-similarity method based on entropy |
EndPage | 668 |
ExternalDocumentID | jsjyyyj201603006 667958607 |
GrantInformation_xml | – fundername: 国家自然科学基金资助项目; 中国地质大学(武汉)中央高校基本科研业务费专项资金资助项目 funderid: (61272470); (2012119039,2012119145) |
GroupedDBID | -0Y 2B. 2C0 2RA 5XA 5XJ 92H 92I 92L ACGFS ALMA_UNASSIGNED_HOLDINGS CCEZO CQIGP CUBFJ CW9 TCJ TGT U1G U5S W92 ~WA 4A8 93N ABJNI PSX |
ID | FETCH-LOGICAL-c606-c27f69bef1dea50912611330b67b434dac9eb4eeeccfc7eb8642f5bc03b5e7ac3 |
ISSN | 1001-3695 |
IngestDate | Thu May 29 03:54:50 EDT 2025 Wed Feb 14 10:24:40 EST 2024 |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 3 |
Keywords | Levenshtein distance algorithm 编辑距离算法 text similarity string match 字符串匹配 最长公共子序列 longest common sequence 文本相似性 |
Language | Chinese |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c606-c27f69bef1dea50912611330b67b434dac9eb4eeeccfc7eb8642f5bc03b5e7ac3 |
Notes | 51-1196/TP Li Shengwen,Ling Wei,Gong Junfang,Zhou Changzheng(1. School of Information Engineering, China University of Geoscie~es, Wuhan 430074, China; 2. State Grid Shiyan Eleetic Power Company, Shiyan Hubei 442000, China) Text comparison is the process to find similarity between the two texts,the higher similarity between the texts show the two texts tend to like. The traditional method was from the perspective of the similarity measure characters of the text,ignored the text similarity factor of the plural common text string within the text. To address this problem,this paper proposed a text- similarity method based on entropy. The method tried to extract common strings from texts,then established a common sub-measure dimensions,and calculated the similarity based on entropy. Experiments show that the method has a smoother similarity curve,so the algorithm is effective and accuracy. text similarity; string match; Levenshtein distance algorithm; longest common sequence |
PageCount | 4 |
ParticipantIDs | wanfang_journals_jsjyyyj201603006 chongqing_primary_667958607 |
PublicationCentury | 2000 |
PublicationDate | 2016 |
PublicationDateYYYYMMDD | 2016-01-01 |
PublicationDate_xml | – year: 2016 text: 2016 |
PublicationDecade | 2010 |
PublicationTitle | 计算机应用研究 |
PublicationTitleAlternate | Application Research of Computers |
PublicationYear | 2016 |
Publisher | 中国地质大学 信息工程学院,武汉,430074%国网十堰供电公司,湖北 十堰,442000 |
Publisher_xml | – name: 中国地质大学 信息工程学院,武汉,430074%国网十堰供电公司,湖北 十堰,442000 |
SSID | ssj0042190 ssib001102940 ssib002263599 ssib023646305 ssib051375744 ssib025702191 |
Score | 2.0208395 |
Snippet | ... TP391.1;... |
SourceID | wanfang chongqing |
SourceType | Aggregation Database Publisher |
StartPage | 665 |
SubjectTerms | 字符串匹配 文本相似性 最长公共子序列 编辑距离算法 |
Title | 一种基于熵的文本相似性计算方法 |
URI | http://lib.cqvip.com/qk/93231X/201603/667958607.html https://d.wanfangdata.com.cn/periodical/jsjyyyj201603006 |
Volume | 33 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR1NaxNBdAgtiBe_xVqVCp2TbM3uzM7HcTfZUDx4itBb2NnstuSQVpse0lMPoie9WUEF8dSL9CJCW_pzYuLP8L3ZyTYULSqEYXgz896bfZv33szOvEfIciqCNE994zEumMdNl3tpGmb49RH9ZzBguY32-VSsPuNP1sK1Wu145tTSzsCsZLu_vVfyP1IFGMgVb8n-g2QrpACAOsgXSpAwlH8lY5pwGis8rJBIGkmqmjQJqW7ROLJNEVUJNilB4xArGiCcJoJqQZW0lQaNGrYpRlQ4qoE_aAK0gDNRNEpo5FsSCdXSDY81VmJGy-yVUwf3D_0blqUQS80tOU4jZfvUHSTSNK62Ce2gpmU_tDyyiutHCALmVcMiBFoJgDTWtKWh6zgZ4ENFwJ_trn1LTQOzNG65cWUSl-mWR3kX0-lnPAHGhJuZU-BlJA33orIZbSzKNBTOsIsyf895m8G00NZmIIGVigCe-hNl_Ntzobqt8RdC6lAJjGMwH0jpg0Kdj-Jm3DrzRcF1m41NGGDYn7O1HwbuFzPKFrMJgvWolG3oMxna1ASlW8GhsQyt4Ri8RJYd948v4h1jhmxs9tefgydkL6b1i7S_PuNDta-RK27xsxSVb_J1UtvduEGuThOLLDk7c5N4o6O9ycGbH59PRidvJ6--Tz68HO-_Hn_6Ovl4NDo9He8d_Dz8Mjl8P94_Hn97d4u0W0m7seq5vB5eBstlLwtkIbTJC7-bp-ivwiLeZ6xuhDSc8W6a6dzwPAflUmQyNwqWyEVosjozYS7TjN0mc_3Nfn6HLHVzERhjMtU1hkuDt8ALGWAQRlUov5ALZLGafGerDN_SqUS3QB66x9Fxf-rtTm-7NxwOe4FNvw6P7-6FGBbJZexZbsndI3ODFzv5fXBSB-aBex1-AV4PbcI |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E4%B8%80%E7%A7%8D%E5%9F%BA%E4%BA%8E%E7%86%B5%E7%9A%84%E6%96%87%E6%9C%AC%E7%9B%B8%E4%BC%BC%E6%80%A7%E8%AE%A1%E7%AE%97%E6%96%B9%E6%B3%95&rft.jtitle=%E8%AE%A1%E7%AE%97%E6%9C%BA%E5%BA%94%E7%94%A8%E7%A0%94%E7%A9%B6&rft.au=%E6%9D%8E%E5%9C%A3%E6%96%87+%E5%87%8C%E5%BE%AE+%E9%BE%9A%E5%90%9B%E8%8A%B3+%E5%91%A8%E9%95%BF%E5%BE%81&rft.date=2016&rft.issn=1001-3695&rft.volume=33&rft.issue=3&rft.spage=665&rft.epage=668&rft_id=info:doi/10.3969%2Fj.issn.1001-3695.2016.03.006&rft.externalDocID=667958607 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F93231X%2F93231X.jpg http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fjsjyyyj%2Fjsjyyyj.jpg |