Novelty detection for text documents using named entity recognition

In order to determine novel information from raw text documents, a novelty detection recommender system was developed to explore the method of comparing various types of entities within sentences. We first detected novel sentences using named entity recognition to extract the entity types of person,...

Full description

Saved in:
Bibliographic Details
Published in2007 6th International Conference on Information, Communications and Signal Processing pp. 1 - 5
Main Authors Kok Wah Ng, Tsai, F.S., Lihui Chen, Kiat Chong Goh
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2007
Subjects
Online AccessGet full text
ISBN1424409829
9781424409822
DOI10.1109/ICICS.2007.4449883

Cover

Abstract In order to determine novel information from raw text documents, a novelty detection recommender system was developed to explore the method of comparing various types of entities within sentences. We first detected novel sentences using named entity recognition to extract the entity types of person, place, time, and organization. In addition, part-of-speech tagging was performed to tag each word in the documents, allowing syntactic structures of noun, verb, and adjective to be used for comparisons. WordNet, an English lexical database of concepts and relations, was also incorporated to generate synonyms for the entities and parts of speech as well as to determine the similarity of sentences. The novelty score of each sentence was determined by using two different metrics, UniqueComparison and Importance Value. UniqueComparison calculated the number of matched entities, whereas ImportanceValue took into account the total weight of matched words that coexisted in both the test and history sentences. The results look promising when compared to the benchmark scores for the Text Retrieval Conference's (TREC) Novelty Track 2004. This demonstrated that the combination of named entity recognition and part-of-speech tagging is capable of detecting novelty with good results.
AbstractList In order to determine novel information from raw text documents, a novelty detection recommender system was developed to explore the method of comparing various types of entities within sentences. We first detected novel sentences using named entity recognition to extract the entity types of person, place, time, and organization. In addition, part-of-speech tagging was performed to tag each word in the documents, allowing syntactic structures of noun, verb, and adjective to be used for comparisons. WordNet, an English lexical database of concepts and relations, was also incorporated to generate synonyms for the entities and parts of speech as well as to determine the similarity of sentences. The novelty score of each sentence was determined by using two different metrics, UniqueComparison and Importance Value. UniqueComparison calculated the number of matched entities, whereas ImportanceValue took into account the total weight of matched words that coexisted in both the test and history sentences. The results look promising when compared to the benchmark scores for the Text Retrieval Conference's (TREC) Novelty Track 2004. This demonstrated that the combination of named entity recognition and part-of-speech tagging is capable of detecting novelty with good results.
Author Tsai, F.S.
Kiat Chong Goh
Kok Wah Ng
Lihui Chen
Author_xml – sequence: 1
  surname: Kok Wah Ng
  fullname: Kok Wah Ng
  organization: Nanyang Technol. Univ., Singapore
– sequence: 2
  givenname: F.S.
  surname: Tsai
  fullname: Tsai, F.S.
  organization: Nanyang Technol. Univ., Singapore
– sequence: 3
  surname: Lihui Chen
  fullname: Lihui Chen
  organization: Nanyang Technol. Univ., Singapore
– sequence: 4
  surname: Kiat Chong Goh
  fullname: Kiat Chong Goh
BookMark eNo1j81OwzAQhI0ACVryAnDxCyR4badZH1HET6QKDvReOfa6MmodlLiIvj1FlNNoRt-MNDN2kYZEjN2CqACEue_arn2vpBBNpbU2iOqMFaZB0FJrYVCZczb7N9JcsWKaPoQQ0Cy0kItr1r4OX7TNB-4pk8txSDwMI8_0nbkf3H5HKU98P8W04cnuyPNjEI_8SG7YpPjbuGGXwW4nKk46Z6unx1X7Ui7fnrv2YVlGI3IZwIba1QJlQNsHj7ZRErWCgBK80t5DH0xtarDaBGMDoCcMDq3T0Fuh5uzubzYS0fpzjDs7Htan2-oHc0RO7w
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICICS.2007.4449883
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781424409839
1424409837
EndPage 5
ExternalDocumentID 4449883
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AARBI
AAWTH
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IERZE
OCL
RIE
RIL
ID FETCH-LOGICAL-i90t-f1af5c5082f8abfd8a7328431f821d34dd1bf95951a49f9af18de8fc8ac41ba03
IEDL.DBID RIE
ISBN 1424409829
9781424409822
IngestDate Wed Aug 27 01:47:31 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-f1af5c5082f8abfd8a7328431f821d34dd1bf95951a49f9af18de8fc8ac41ba03
PageCount 5
ParticipantIDs ieee_primary_4449883
PublicationCentury 2000
PublicationDate 2007-Dec.
PublicationDateYYYYMMDD 2007-12-01
PublicationDate_xml – month: 12
  year: 2007
  text: 2007-Dec.
PublicationDecade 2000
PublicationTitle 2007 6th International Conference on Information, Communications and Signal Processing
PublicationTitleAbbrev ICICS
PublicationYear 2007
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001764026
Score 1.5052834
Snippet In order to determine novel information from raw text documents, a novelty detection recommender system was developed to explore the method of comparing...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Data mining
Equations
Frequency
Information processing
novelty detection
recommender system
Recommender systems
Speech
Tagging
Testing
text mining
Text recognition
Title Novelty detection for text documents using named entity recognition
URI https://ieeexplore.ieee.org/document/4449883
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1dS8MwFL1se_JJZRO_yYOPdutH0iXPxbEJG4IT9jbyKaK0op0wf723aTtRfPCtbSDkktBzc3POCcCVYEIiDrJAWKUC_PtFgcSFFSiE8ljRVLK0EgrPF-n0gd6u2KoD1zstjLXWk8_ssHr0Z_mm0JuqVDailArOky50cZnVWq3veso4xa1Q2mq3QsFj0Vo6Ne9xK5oJxWiWzbL72sGw6fXH9SoeXSb7MG_HVZNKnoebUg315y_Lxv8O_AAG3zo-crdDqEPo2LwP2aL4sC_llhhbeh5WTjBxJRUDhLRdvZOKD_9IcolgSbyUd0t2XKMiH8BycrPMpkFzlULwJMIycJF0TGMuFjsulTNcVh49mDs4HkcmocZEyuG0sUhS4YR0ETeWO82lppGSYXIEvbzI7TEQbHPSjl0SM41YpkSSYhSxGmsmMduUJ9Cv4l-_1mYZ6yb0078_n8GeL5Z6fsg59Mq3jb1AlC_VpZ_eL9WTpMg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH7MedCTyib-NgePduuPpE3OxbHpNgQn7DaSNhFRWtFOmH-9r2m7oXjwliYQ8kjo9_r6fV8ArgQTEnGQOUIr5eDbz3MkHixHIZT7ioaShaVQeDINh4_0ds7mLbhea2G01pZ8pntl0_7LT_NkWZbK-pRSwXmwBduI-5RVaq1NRSUK8WMobNRbruC-aEyd6me_kc24oj-KR_FD5WFYz_vjghWLL4M9mDQrq2glL71loXrJ1y_Txv8ufR-6GyUfuV9j1AG0dNaBeJp_6tdiRVJdWCZWRjB1JSUHhDRTfZCSEf9EMolwSayYd0XWbKM868JscDOLh059mYLzLNzCMZ40LMFszDdcKpNyWbr0YPZguO-lAU1TTxncOOZJKoyQxuOp5ibhMqGekm5wCO0sz_QREBwzUkcm8FmCaKZEEGIUvooSJjHflMfQKeNfvFV2GYs69JO_uy9hZzibjBfj0fTuFHZt6dSyRc6gXbwv9TlifqEu7FZ_A6lqqBU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2007+6th+International+Conference+on+Information%2C+Communications+and+Signal+Processing&rft.atitle=Novelty+detection+for+text+documents+using+named+entity+recognition&rft.au=Kok+Wah+Ng&rft.au=Tsai%2C+F.S.&rft.au=Lihui+Chen&rft.au=Kiat+Chong+Goh&rft.date=2007-12-01&rft.pub=IEEE&rft.isbn=9781424409822&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICICS.2007.4449883&rft.externalDocID=4449883
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424409822/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424409822/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781424409822/sc.gif&client=summon&freeimage=true