Novelty detection for text documents using named entity recognition
In order to determine novel information from raw text documents, a novelty detection recommender system was developed to explore the method of comparing various types of entities within sentences. We first detected novel sentences using named entity recognition to extract the entity types of person,...
Saved in:
Published in | 2007 6th International Conference on Information, Communications and Signal Processing pp. 1 - 5 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2007
|
Subjects | |
Online Access | Get full text |
ISBN | 1424409829 9781424409822 |
DOI | 10.1109/ICICS.2007.4449883 |
Cover
Abstract | In order to determine novel information from raw text documents, a novelty detection recommender system was developed to explore the method of comparing various types of entities within sentences. We first detected novel sentences using named entity recognition to extract the entity types of person, place, time, and organization. In addition, part-of-speech tagging was performed to tag each word in the documents, allowing syntactic structures of noun, verb, and adjective to be used for comparisons. WordNet, an English lexical database of concepts and relations, was also incorporated to generate synonyms for the entities and parts of speech as well as to determine the similarity of sentences. The novelty score of each sentence was determined by using two different metrics, UniqueComparison and Importance Value. UniqueComparison calculated the number of matched entities, whereas ImportanceValue took into account the total weight of matched words that coexisted in both the test and history sentences. The results look promising when compared to the benchmark scores for the Text Retrieval Conference's (TREC) Novelty Track 2004. This demonstrated that the combination of named entity recognition and part-of-speech tagging is capable of detecting novelty with good results. |
---|---|
AbstractList | In order to determine novel information from raw text documents, a novelty detection recommender system was developed to explore the method of comparing various types of entities within sentences. We first detected novel sentences using named entity recognition to extract the entity types of person, place, time, and organization. In addition, part-of-speech tagging was performed to tag each word in the documents, allowing syntactic structures of noun, verb, and adjective to be used for comparisons. WordNet, an English lexical database of concepts and relations, was also incorporated to generate synonyms for the entities and parts of speech as well as to determine the similarity of sentences. The novelty score of each sentence was determined by using two different metrics, UniqueComparison and Importance Value. UniqueComparison calculated the number of matched entities, whereas ImportanceValue took into account the total weight of matched words that coexisted in both the test and history sentences. The results look promising when compared to the benchmark scores for the Text Retrieval Conference's (TREC) Novelty Track 2004. This demonstrated that the combination of named entity recognition and part-of-speech tagging is capable of detecting novelty with good results. |
Author | Tsai, F.S. Kiat Chong Goh Kok Wah Ng Lihui Chen |
Author_xml | – sequence: 1 surname: Kok Wah Ng fullname: Kok Wah Ng organization: Nanyang Technol. Univ., Singapore – sequence: 2 givenname: F.S. surname: Tsai fullname: Tsai, F.S. organization: Nanyang Technol. Univ., Singapore – sequence: 3 surname: Lihui Chen fullname: Lihui Chen organization: Nanyang Technol. Univ., Singapore – sequence: 4 surname: Kiat Chong Goh fullname: Kiat Chong Goh |
BookMark | eNo1j81OwzAQhI0ACVryAnDxCyR4badZH1HET6QKDvReOfa6MmodlLiIvj1FlNNoRt-MNDN2kYZEjN2CqACEue_arn2vpBBNpbU2iOqMFaZB0FJrYVCZczb7N9JcsWKaPoQQ0Cy0kItr1r4OX7TNB-4pk8txSDwMI8_0nbkf3H5HKU98P8W04cnuyPNjEI_8SG7YpPjbuGGXwW4nKk46Z6unx1X7Ui7fnrv2YVlGI3IZwIba1QJlQNsHj7ZRErWCgBK80t5DH0xtarDaBGMDoCcMDq3T0Fuh5uzubzYS0fpzjDs7Htan2-oHc0RO7w |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ICICS.2007.4449883 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9781424409839 1424409837 |
EndPage | 5 |
ExternalDocumentID | 4449883 |
Genre | orig-research |
GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AARBI AAWTH ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IERZE OCL RIE RIL |
ID | FETCH-LOGICAL-i90t-f1af5c5082f8abfd8a7328431f821d34dd1bf95951a49f9af18de8fc8ac41ba03 |
IEDL.DBID | RIE |
ISBN | 1424409829 9781424409822 |
IngestDate | Wed Aug 27 01:47:31 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i90t-f1af5c5082f8abfd8a7328431f821d34dd1bf95951a49f9af18de8fc8ac41ba03 |
PageCount | 5 |
ParticipantIDs | ieee_primary_4449883 |
PublicationCentury | 2000 |
PublicationDate | 2007-Dec. |
PublicationDateYYYYMMDD | 2007-12-01 |
PublicationDate_xml | – month: 12 year: 2007 text: 2007-Dec. |
PublicationDecade | 2000 |
PublicationTitle | 2007 6th International Conference on Information, Communications and Signal Processing |
PublicationTitleAbbrev | ICICS |
PublicationYear | 2007 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0001764026 |
Score | 1.5052834 |
Snippet | In order to determine novel information from raw text documents, a novelty detection recommender system was developed to explore the method of comparing... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | Data mining Equations Frequency Information processing novelty detection recommender system Recommender systems Speech Tagging Testing text mining Text recognition |
Title | Novelty detection for text documents using named entity recognition |
URI | https://ieeexplore.ieee.org/document/4449883 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dS8MwFL1se_JJZRO_yYOPdutH0iXPxbEJG4IT9jbyKaK0op0wf723aTtRfPCtbSDkktBzc3POCcCVYEIiDrJAWKUC_PtFgcSFFSiE8ljRVLK0EgrPF-n0gd6u2KoD1zstjLXWk8_ssHr0Z_mm0JuqVDailArOky50cZnVWq3veso4xa1Q2mq3QsFj0Vo6Ne9xK5oJxWiWzbL72sGw6fXH9SoeXSb7MG_HVZNKnoebUg315y_Lxv8O_AAG3zo-crdDqEPo2LwP2aL4sC_llhhbeh5WTjBxJRUDhLRdvZOKD_9IcolgSbyUd0t2XKMiH8BycrPMpkFzlULwJMIycJF0TGMuFjsulTNcVh49mDs4HkcmocZEyuG0sUhS4YR0ETeWO82lppGSYXIEvbzI7TEQbHPSjl0SM41YpkSSYhSxGmsmMduUJ9Cv4l-_1mYZ6yb0078_n8GeL5Z6fsg59Mq3jb1AlC_VpZ_eL9WTpMg |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7MedCTyib-NgePduuPpE3OxbHpNgQn7DaSNhFRWtFOmH-9r2m7oXjwliYQ8kjo9_r6fV8ArgQTEnGQOUIr5eDbz3MkHixHIZT7ioaShaVQeDINh4_0ds7mLbhea2G01pZ8pntl0_7LT_NkWZbK-pRSwXmwBduI-5RVaq1NRSUK8WMobNRbruC-aEyd6me_kc24oj-KR_FD5WFYz_vjghWLL4M9mDQrq2glL71loXrJ1y_Txv8ufR-6GyUfuV9j1AG0dNaBeJp_6tdiRVJdWCZWRjB1JSUHhDRTfZCSEf9EMolwSayYd0XWbKM868JscDOLh059mYLzLNzCMZ40LMFszDdcKpNyWbr0YPZguO-lAU1TTxncOOZJKoyQxuOp5ibhMqGekm5wCO0sz_QREBwzUkcm8FmCaKZEEGIUvooSJjHflMfQKeNfvFV2GYs69JO_uy9hZzibjBfj0fTuFHZt6dSyRc6gXbwv9TlifqEu7FZ_A6lqqBU |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2007+6th+International+Conference+on+Information%2C+Communications+and+Signal+Processing&rft.atitle=Novelty+detection+for+text+documents+using+named+entity+recognition&rft.au=Kok+Wah+Ng&rft.au=Tsai%2C+F.S.&rft.au=Lihui+Chen&rft.au=Kiat+Chong+Goh&rft.date=2007-12-01&rft.pub=IEEE&rft.isbn=9781424409822&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICICS.2007.4449883&rft.externalDocID=4449883 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424409822/lc.gif&client=summon&freeimage=true |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424409822/mc.gif&client=summon&freeimage=true |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424409822/sc.gif&client=summon&freeimage=true |