CoCITe-Coordinating Changes in Text
Text streams are ubiquitous and contain a wealth of information, but are typically orders of magnitude too large in scale for comprehensive human inspection. There is a need for tools that can detect and group changes occurring within text streams and substreams, in order to find, structure, and sum...
Saved in:
Published in | IEEE transactions on knowledge and data engineering Vol. 24; no. 1; pp. 15 - 29 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.01.2012
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Text streams are ubiquitous and contain a wealth of information, but are typically orders of magnitude too large in scale for comprehensive human inspection. There is a need for tools that can detect and group changes occurring within text streams and substreams, in order to find, structure, and summarize these changes for presentation to human analysts. This paper describes a procedure for efficiently finding step changes, trends, bursts, and cyclic changes affecting frequencies of words, or more general lexical items, within streams of documents which may be optionally labeled with metadata. The common phenomenon of over-dispersion is accommodated using mixture distributions. A streaming implementation is described which can process data from a continuous feed. Anomalies can be detected, grouped, and rendered visually for human comprehension. |
---|---|
AbstractList | Text streams are ubiquitous and contain a wealth of information, but are typically orders of magnitude too large in scale for comprehensive human inspection. There is a need for tools that can detect and group changes occurring within text streams and substreams, in order to find, structure, and summarize these changes for presentation to human analysts. This paper describes a procedure for efficiently finding step changes, trends, bursts, and cyclic changes affecting frequencies of words, or more general lexical items, within streams of documents which may be optionally labeled with metadata. The common phenomenon of over-dispersion is accommodated using mixture distributions. A streaming implementation is described which can process data from a continuous feed. Anomalies can be detected, grouped, and rendered visually for human comprehension. |
Author | Grothendieck, J. Wright, J. H. |
Author_xml | – sequence: 1 givenname: J. H. surname: Wright fullname: Wright, J. H. email: jwright@research.att.com organization: AT&T Labs.-Res., Florham Park, NJ, USA – sequence: 2 givenname: J. surname: Grothendieck fullname: Grothendieck, J. email: jgrothen@bbn.com organization: Raytheon BBN Technol., Columbia, MD, USA |
BookMark | eNo9kEFLAzEQhYNUsK0ePXkpek6dJJNNcpS1arHgZT2HuJutWzSpyRb037tLxdO8gY_34JuRSYjBE3LJYMkYmNvq-X615DC8XMIJmTIpNeXMsMmQARlFgeqMzHLeAYBWmk3JTRnLdeVpGWNquuD6LmwX5bsLW58XXVhU_rs_J6et-8j-4u_OyevDqiqf6OblcV3ebWjNEXrKpNHA6sY0Dj0W0kljXGtAeY2sqLF-86A5ClM7UXjN0KDgRirTFEK0rRJzcn3s3af4dfC5t7t4SGGYtIYJxTUgHyB6hOoUc06-tfvUfbr0YxnYUYMdNdhRgx00DPzVke-89_-sLBQCgvgFxt9W1g |
CODEN | ITKEEH |
Cites_doi | 10.1086/306064 10.1093/biomet/66.3.585 10.1093/biomet/73.1.85 10.1145/1255438.1255439 10.1080/01621459.1989.10478792 10.1145/775094.775101 10.1093/biostatistics/kxm030 10.1093/biomet/34.1-2.123 10.1186/1471-2288-8-58 10.1145/1132960.1132963 10.1145/1277741.1277779 10.1016/j.stamet.2004.10.004 10.1145/1281192.1281276 10.1145/290941.290954 10.1007/978-1-4615-0933-2 10.1145/775047.775061 10.1109/ICDM.2006.99 10.1007/978-3-540-30143-1_9 10.1007/s10618-007-0066-x 10.1145/1007568.1007586 10.1007/s10115-004-0157-6 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jan 2012 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jan 2012 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TKDE.2010.250 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1558-2191 |
EndPage | 29 |
ExternalDocumentID | 2553145531 10_1109_TKDE_2010_250 5674040 |
Genre | orig-research |
GroupedDBID | -~X .DC 0R~ 1OL 29I 4.4 5GY 5VS 6IK 97E 9M8 AAJGR AASAJ AAYOK ABFSI ABQJQ ABVLG ACGFO ACIWK AENEX AETIX AI. AIBXA AKJIK ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIC RIE RIG RNI RNS RXW RZB TAE TAF TN5 UHB VH1 XFK AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c240t-159801cd9da4e465a599af907e8416c4cbe082439ca36e81494329579d633ff73 |
IEDL.DBID | RIE |
ISSN | 1041-4347 |
IngestDate | Fri Sep 13 04:47:30 EDT 2024 Fri Aug 23 01:04:21 EDT 2024 Wed Jun 26 19:28:22 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c240t-159801cd9da4e465a599af907e8416c4cbe082439ca36e81494329579d633ff73 |
PQID | 913728042 |
PQPubID | 85438 |
PageCount | 15 |
ParticipantIDs | crossref_primary_10_1109_TKDE_2010_250 ieee_primary_5674040 proquest_journals_913728042 |
PublicationCentury | 2000 |
PublicationDate | 2012-Jan. 2012-01-00 20120101 |
PublicationDateYYYYMMDD | 2012-01-01 |
PublicationDate_xml | – month: 01 year: 2012 text: 2012-Jan. |
PublicationDecade | 2010 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on knowledge and data engineering |
PublicationTitleAbbrev | TKDE |
PublicationYear | 2012 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | bibttk201201001515 bibttk201201001516 bibttk201201001517 bibttk201201001519 bibttk201201001510 bibttk201201001511 van dongen (bibttk201201001522) 2000 bibttk201201001512 bibttk201201001513 bibttk201201001514 seber (bibttk201201001518) 1977 bibttk20120100159 bibttk20120100157 bibttk20120100155 bibttk20120100154 bibttk20120100153 bibttk20120100152 van dongen (bibttk201201001523) 2000 allan (bibttk20120100156) 2002 bibttk20120100151 (bibttk201201001527) 2002 bibttk201201001520 bibttk201201001521 scarfone (bibttk201201001524) 2007 kleinberg (bibttk20120100158) 2006 bibttk201201001525 viinikka (bibttk201201001526) 2004 |
References_xml | – ident: bibttk20120100152 doi: 10.1086/306064 – ident: bibttk201201001521 doi: 10.1093/biomet/66.3.585 – year: 2000 ident: bibttk201201001522 article-title: MCLA Cluster Algorithm for Graphs contributor: fullname: van dongen – ident: bibttk20120100151 doi: 10.1093/biomet/73.1.85 – year: 2000 ident: bibttk201201001523 publication-title: Graph Clustering by Flow Simulation contributor: fullname: van dongen – ident: bibttk201201001514 doi: 10.1145/1255438.1255439 – ident: bibttk201201001520 doi: 10.1080/01621459.1989.10478792 – ident: bibttk201201001525 doi: 10.1145/775094.775101 – ident: bibttk201201001516 doi: 10.1093/biostatistics/kxm030 – ident: bibttk201201001519 doi: 10.1093/biomet/34.1-2.123 – ident: bibttk201201001517 doi: 10.1186/1471-2288-8-58 – ident: bibttk201201001515 doi: 10.1145/1132960.1132963 – ident: bibttk201201001513 doi: 10.1145/1277741.1277779 – year: 2007 ident: bibttk201201001524 article-title: Guide to Intrusion Detection and Prevention Systems (IDPS) publication-title: NIST Special Publication 800-94 contributor: fullname: scarfone – ident: bibttk20120100154 doi: 10.1016/j.stamet.2004.10.004 – ident: bibttk201201001510 doi: 10.1145/1281192.1281276 – ident: bibttk20120100155 doi: 10.1145/290941.290954 – year: 2002 ident: bibttk20120100156 publication-title: Topic Detection and Tracking doi: 10.1007/978-1-4615-0933-2 contributor: fullname: allan – year: 2002 ident: bibttk201201001527 publication-title: The AQUAINT Corpus of English News Text – ident: bibttk20120100157 doi: 10.1145/775047.775061 – ident: bibttk201201001512 doi: 10.1109/ICDM.2006.99 – start-page: 166 year: 2004 ident: bibttk201201001526 article-title: Monitoring IDS Background Noise Using EWMA Control Charts and Alert Information publication-title: Proc Seventh Int'l Symp Recent Advances in Intrusion Detection (RAID) doi: 10.1007/978-3-540-30143-1_9 contributor: fullname: viinikka – year: 2006 ident: bibttk20120100158 publication-title: Data Stream Management Processing High-Speed Data Streams contributor: fullname: kleinberg – ident: bibttk201201001511 doi: 10.1007/s10618-007-0066-x – ident: bibttk20120100159 doi: 10.1145/1007568.1007586 – year: 1977 ident: bibttk201201001518 publication-title: Linear Regression Analysis contributor: fullname: seber – ident: bibttk20120100153 doi: 10.1007/s10115-004-0157-6 |
SSID | ssj0008781 |
Score | 2.0540352 |
Snippet | Text streams are ubiquitous and contain a wealth of information, but are typically orders of magnitude too large in scale for comprehensive human inspection.... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 15 |
SubjectTerms | Data models Dynamic programming Heuristic algorithms modeling structured Multimedia communication Statistical analysis Statistical software Text mining textual and multimedia data Time frequency analysis |
Title | CoCITe-Coordinating Changes in Text |
URI | https://ieeexplore.ieee.org/document/5674040 https://www.proquest.com/docview/913728042/abstract/ |
Volume | 24 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjR1LT8Iw-Aty0oMoaETULNF4crCtj61HMyGowdNIuC1dVwwx2YyMi7_er3sgUQ_edmi7pt_7DXBjdFzUkpXtpEjkNOCuHXCBpkrCkF26NOWeqUaevfDpnD4t2KIFd9taGK11mXymh-azjOWnudoYV9mIcZ8i0u3BXuB4Va3WlusGfjmQFK0L_BGh_nc_zVH0_DCukrg8U16_I3_KgSq_uHApWiYdmDWXqjJK3oabIhmqzx_9Gv976yM4rHVM675CimNo6awLnWZ-g1WTcxcOdpoR9uA6zMPHSNthjvboyjgJs1erKj5YW6vMipCNn8B8Mo7CqV2PULAViurCRmUFRZBKRSqpppxJJoRcokGsTbhRUZVo1AEQXkoSrgM0lyjxTOQu5YQslz45hXaWZ_oMLO1qSiXRTJoNTEiW4AIn0IwLqXjah9vmYeP3qlNGXFoYjogNBGIDgRgh0IeeeaTtovp9-jBowBDXdLSOhUvM_Czqnf-9aQD7eKxX-UQuoF18bPQlaglFclWixxc2lrV1 |
link.rule.ids | 315,786,790,802,27957,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjR27TsMw8FTKAAwUWhClPCKBmEibh-3EIwqtWvqYUqlblDguqpASRNOFr-ecR6mAgc2DLVv3vvM9AO6VjYtWstCNGJmcuMzUXcbRVYkoikuTxMxS1cjTGRvOycuCLmrwuK2FkVLmyWeyq5b5X36cio0KlfUocwgS3R7so543eFGttZW7rpOPJEX_Aq-yifPdUbPnj5_7RRqXpQrsdzRQPlLllxzOlcugAdPqWUVOyVt3k0Vd8fmjY-N_330Cx6WVqT0VZHEKNZk0oVFNcNBKhm7C0U47whbceak38qXupeiRrlSYMHnVivKDtbZKNB8F-RnMB33fG-rlEAVdoLLOdDRXUAmJmMchkYTRkHIeLtEllurDURARSbQCEGMitJl00WEitqX-7mJm28ulY59DPUkTeQGaNCUhoS1pqA5QHtIINxiupIyHgsVteKgAG7wXvTKC3McweKAwECgMBIiBNrQUkLabSvi0oVOhISg5aR1w01YTtIh1-fehWzgY-tNJMBnNxh04xCusIkJyBfXsYyOv0WbIopucVL4A_Zy4yw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=CoCITe--Coordinating+Changes+in+Text&rft.jtitle=IEEE+transactions+on+knowledge+and+data+engineering&rft.au=Wright%2C+Jeremy&rft.au=Grothendieck%2C+John&rft.date=2012-01-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1041-4347&rft.eissn=1558-2191&rft.volume=24&rft.issue=1&rft.spage=15&rft_id=info:doi/10.1109%2FTKDE.2010.250&rft.externalDBID=NO_FULL_TEXT&rft.externalDocID=2553145531 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1041-4347&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1041-4347&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1041-4347&client=summon |