Text Mining Predictive Methods for Analyzing Unstructured Information

The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly available information. Huge amounts of information are private and are of interest to local communities, such as the records of customers of a b...

Full description

Saved in:
Bibliographic Details
Main Authors Damerau, Fred, Indurkhya, Nitin, Weiss, Sholom M, Zhang, Tong
Format eBook Book
LanguageEnglish
Published New York, NY Springer-Verlag 2004
Springer
Springer New York
Edition1. Aufl.
Subjects
Online AccessGet full text
ISBN0387954333
9780387954332
DOI10.1007/978-0-387-34555-0

Cover

Loading…
Abstract The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly available information. Huge amounts of information are private and are of interest to local communities, such as the records of customers of a business. This information is overwhelmingly text and has its record-keeping purpose, but an automated analysis might be desirable to find patterns in the stored records. Analogous to this data mining is text mining, which also finds patterns and trends in information samples but which does so with far less structured--though with greater immediate utility for users--ingredients. This book focuses on the concepts and methods needed to expand horizons beyond structured, numeric data to automated mining of text samples. It introduces the new world of text mining and examines proven methods for various critical text-mining tasks, such as automated document indexing and information retrieval and search. New research areas are explored, such as information extraction and document summarization, that rely on evolving text-mining techniques. TOC:Overview of text mining.- From textual information to numerical vectors.- Using text for prediction.- Information retrieval and text mining.- Finding structure in a document collection.- Looking for information in documents.- Case studies.- Emerging directions.- Appendix: software notes.- References.- Author and subject indexes.
AbstractList The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly available information. Huge amounts of information are private and are of interest to local communities, such as the records of customers of a business. This information is overwhelmingly text and has its record-keeping purpose, but an automated analysis might be desirable to find patterns in the stored records. Analogous to this data mining is text mining, which also finds patterns and trends in information samples but which does so with far less structured--though with greater immediate utility for users--ingredients. This book focuses on the concepts and methods needed to expand horizons beyond structured, numeric data to automated mining of text samples. It introduces the new world of text mining and examines proven methods for various critical text-mining tasks, such as automated document indexing and information retrieval and search. New research areas are explored, such as information extraction and document summarization, that rely on evolving text-mining techniques. TOC:Overview of text mining.- From textual information to numerical vectors.- Using text for prediction.- Information retrieval and text mining.- Finding structure in a document collection.- Looking for information in documents.- Case studies.- Emerging directions.- Appendix: software notes.- References.- Author and subject indexes.
Author Indurkhya, Nitin
Zhang, Tong
Weiss, Sholom M
Damerau, Fred
Author_xml – sequence: 1
  fullname: Damerau, Fred
– sequence: 2
  fullname: Indurkhya, Nitin
– sequence: 3
  fullname: Weiss, Sholom M
– sequence: 4
  fullname: Zhang, Tong
BackLink https://cir.nii.ac.jp/crid/1130282272545633024$$DView record in CiNii
BookMark eNotj81OwzAQhI2giLb0AbiBhLiFrne9tnOkVfmRirhUXC3XdVCgJCUpErw9ToMPa83oG3tnJE6quopCXEi4lQBmmhubQUbWZKSYOYMjMUkeJOdgwLEYdSJnRUQDMUIABawZ1akYMjKgyo06E5O2fYd0CIGUGYrxKv7sL5_LqqzezsWg8Ns2Tv7vsXi9X6zmj9ny5eFpfrfMvFYWOPOFCdYHjNIWbBVpJGt1EVhJ70kVYDc5SA7BS_AyGrNZmwKZ1holW4k0FtP-4XbXpH9j49Z1_dE6Ca5r61IzBy7VcYdyDlLipk_smvrrO7Z7F7tIiNW-8Vu3mM1ZysQm8LoHq7J0oeymlARoEQ2yYk1JqIRd9VgofbWpXdrj0ze_rpcKtbb0B7JCY1Y
ContentType eBook
Book
Copyright Springer-Verlag New York 2005
Copyright_xml – notice: Springer-Verlag New York 2005
DBID 08O
RYH
DEWEY 006.312
DOI 10.1007/978-0-387-34555-0
DatabaseName ciando eBooks
CiNii Complete
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9780387345550
0387345558
Edition 1. Aufl.
1
ExternalDocumentID 74036
EBC511455
BA70030520
ciando42668
GroupedDBID -T.
08O
0D6
0DA
0E8
20A
38.
64P
AABBV
AABSQ
AAIER
AAJYQ
AATVQ
AAUKK
ABBRO
ABBUY
ABCYT
ABMNI
ACAMX
ACBPT
ACDTA
ACDUY
ADVHH
AEHEY
AEJLV
AEKFX
AETDV
AEZAY
AFUVA
AHMWK
AHNNE
ALMA_UNASSIGNED_HOLDINGS
ATJMZ
AZZ
BBABE
CZZ
E6I
IEZ
JJU
LTD
MYL
NUD
SBO
SUFPE
SVJCK
TPJZQ
UR3
Z5O
Z7R
Z7U
Z7W
Z7X
Z7Z
Z81
Z83
Z84
Z85
Z87
Z88
RYH
ID FETCH-LOGICAL-a64805-af7c8ac2e18f5843623886fc541aa34f08d9015cca10a1e77db7f253b62158123
ISBN 0387954333
9780387954332
IngestDate Tue Jul 29 19:53:24 EDT 2025
Fri May 30 21:09:56 EDT 2025
Fri Jun 27 00:08:35 EDT 2025
Mon Jun 30 03:59:41 EDT 2025
IsPeerReviewed false
IsScholarly false
Keywords Computer Informatik
LCCN 2004056524
LCCallNum_Ident QA76.9.D343
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a64805-af7c8ac2e18f5843623886fc541aa34f08d9015cca10a1e77db7f253b62158123
Notes Includes bibliographical references (p. [217]-228) and indexes
OCLC 525024974
PQID EBC511455
PageCount 247
ParticipantIDs springer_books_10_1007_978_0_387_34555_0
proquest_ebookcentral_EBC511455
nii_cinii_1130282272545633024
ciando_primary_ciando42668
PublicationCentury 2000
PublicationDate 2004
c2005
2005
PublicationDateYYYYMMDD 2004-01-01
2005-01-01
PublicationDate_xml – year: 2004
  text: 2004
PublicationDecade 2000
PublicationPlace New York, NY
PublicationPlace_xml – name: New York
– name: New York, NY
PublicationYear 2004
2005
Publisher Springer-Verlag
Springer
Springer New York
Publisher_xml – name: Springer-Verlag
– name: Springer
– name: Springer New York
SSID ssj0000320347
Score 2.52048
Snippet The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly...
SourceID springer
proquest
nii
ciando
SourceType Publisher
SubjectTerms Computer Appl. in Administrative Data Processing
Computer Science
Data mining
Data Mining and Knowledge Discovery
Database Management
Information Storage and Retrieval
Information Systems and Communication Service
Natural Language Processing (NLP)
Subtitle Predictive Methods for Analyzing Unstructured Information
TableOfContents 4.4 Measuring Similarity -- 4.4.1 Shared Word Count -- 4.4.2 Word Count and Bonus -- 4.4.3 Cosine Similarity -- 4.5 Web-Based Document Search -- 4.5.1 Link Analysis -- 4.6 Document Matching -- 4.7 Inverted Lists -- 4.8 Evaluation of Performance -- 4.9 Historical and Bibliographical Remarks -- 5 Finding Structure in a Document Collection -- 5.1 Clustering Documents by Similarity -- 5.2 Similarity of Composite Documents -- 5.2.1 k-Means Clustering -- 5.2.1.1 Centroid Classifier -- 5.2.2 Hierarchical Clustering -- 5.2.3 The EM Algorithm -- 5.3 What Do a Cluster's Labels Mean? -- 5.4 Applications -- 5.5 Evaluation of Performance -- 5.6 Historical and Bibliographical Remarks -- 6 Looking for Information in Documents -- 6.1 Goals of Information Extraction -- 6.2 Finding Patterns and Entities from Text -- 6.2.1 Entity Extraction as Sequential Tagging -- 6.2.2 Tag Prediction as Classification -- 6.2.3 The Maximum Entropy Method -- 6.2.4 Linguistic Features and Encoding -- 6.2.5 Sequential Probability Model -- 6.3 Coreference and Relationship Extraction -- 6.3.1 Coreference Resolution -- 6.3.2 Relationship Extraction -- 6.4 Template Filling and Database Construction -- 6.5 Applications -- 6.5.1 Information Retrieval -- 6.5.2 Commercial Extraction Systems -- 6.5.3 Criminal Justice -- 6.5.4 Intelligence -- 6.6 Historical and Bibliographical Remarks -- 7 Case Studies -- 7.1 Market Intelligence from the Web -- 7.2 Lightweight Document Matching for Digital Libraries -- 7.3 Generating Model Cases for Help Desk Applications -- 7.4 Assigning Topics to News Articles -- 7.5 E-mail Filtering -- 7.6 Search Engines -- 7.7 Extracting Named Entities from Documents -- 7.8 Customized Newspapers -- 7.9 Historical and Bibliographical Remarks -- 8 Emerging Directions -- 8.1 Summarization -- 8.2 Active Learning -- 8.3 Learning with Unlabeled Data
8.4 Different Ways of Collecting Samples -- 8.4.1 Multiple Samples and Voting Methods -- 8.4.2 Online Learning -- 8.4.3 Cost-Sensitive Learning -- 8.4.4 Unbalanced Samples and Rare Events -- 8.5 Question Answering -- 8.6 Historical and Bibliographical Remarks -- Appendix: Software Notes -- A.1 Summary of Software -- A.2 Requirements -- A.3 Download Instructions -- References -- Author Index -- Subject Index
Intro -- CONTENTS -- Preface -- 1 Overview of Text Mining -- 1.1 What's Special about Text Mining? -- 1.1.1 Structured or Unstructured Data? -- 1.1.2 Is Text Different from Numbers? -- 1.2 What Types of Problems Can Be Solved? -- 1.3 Document Classification -- 1.4 Information Retrieval -- 1.5 Clustering and Organizing Documents -- 1.6 Information Extraction -- 1.7 Prediction and Evaluation -- 1.8 The Next Chapters -- 1.9 Historical and Bibliographical Remarks -- 2 From Textual Information to Numerical Vectors -- 2.1 Collecting Documents -- 2.2 Document Standardization -- 2.3 Tokenization -- 2.4 Lemmatization -- 2.4.1 Inflectional Stemming -- 2.4.2 Stemming to a Root -- 2.5 Vector Generation for Prediction -- 2.5.1 Multiword Features -- 2.5.2 Labels for the Right Answers -- 2.5.3 Feature Selection by Attribute Ranking -- 2.6 Sentence Boundary Determination -- 2.7 Part-Of-Speech Tagging -- 2.8 Word Sense Disambiguation -- 2.9 Phrase Recognition -- 2.10 Named Entity Recognition -- 2.11 Parsing -- 2.12 Feature Generation -- 2.13 Historical and Bibliographical Remarks -- 3 Using Text for Prediction -- 3.1 Recognizing that Documents Fit a Pattern -- 3.2 How Many Documents Are Enough? -- 3.3 Document Classification -- 3.4 Learning to Predict from Text -- 3.4.1 Similarity and Nearest-Neighbor Methods -- 3.4.2 Document Similarity -- 3.4.3 Decision Rules -- 3.4.3.1 How to Find the Best Decision Rules -- 3.4.4 Scoring by Probabilities -- 3.4.5 Linear Scoring Methods -- 3.4.5.1 How to Find the Best Scoring Model -- 3.5 Evaluation of Performance -- 3.5.1 Estimating Current and Future Performance -- 3.5.2 Getting the Most from a Learning Method -- 3.6 Applications -- 3.7 Historical and Bibliographical Remarks -- 4 Information Retrieval and Text Mining -- 4.1 Is Information Retrieval a Form of Text Mining? -- 4.2 Key Word Search -- 4.3 Nearest-Neighbor Methods
Title Text Mining
URI http://ebooks.ciando.com/book/index.cfm/bok_id/42668
https://cir.nii.ac.jp/crid/1130282272545633024
https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=511455
http://link.springer.com/10.1007/978-0-387-34555-0
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA66Xjz5xvWZgwdBKm2SJl1vKqsirheft5C2KSzo7uLjoL_emTRtd5cF0UtoQmnKfO3ky8xkhpADaUVHskwGkVAiECYNA8MZDzjLQxtmlil3PLp3K68exPVz_NxUIHSnSz7S4-x75rmS_6AKY4ArnpL9A7L1Q2EArgFfaAFhaKfIb9314IJKPXp1tR3cnn70hv4WFwVUloR-99GR5uXrG2_69IliPzHc3CdLHXfBP9l-WT39DrXh61HveMIcEE-ZAypz4MQ2EV3UnRgzlc1Umk2cRJlql4s4huc2K0Qdt3d2qtymg4XzZF4p0CILp93rm8faqoXl2LlQ3i3u5uQ-zVH9DpVveSK9r58TeAAac_IhLPWDfn-C9k95qh0BuF8mLTwUskLm7GCVLFWlMKjXjGvkEhGhJSL0hDZ4UI8HBZHTGg86jgcdw2OdPF5078-vAl-jIjBSJCB-U6gsMRmzUVIAmQM-wJNEFlksImO4KMIkR8oFP0oUmsgqlaeqYDFPJZAtYFd8g7QGw4HdJDSzkYm4tIkVTOSwErGMMQNK0xgrO6lsk61SOHpUZiLRZRdJVtImuyAwGME2Qnc0UD_FkCBz6Ig22a9EqZ0f3gf_6u7ZOZBuEH-bHFYS1njDu65SWgNMOtQAk3Yw6XDrl8m2yWLzbe6QFkjU7gJ5-0j3_BfzAwWqN_k
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Text+mining+%3A+predictive+methods+for+analyzing+unstructured+information&rft.au=Weiss%2C+Sholom+M.&rft.date=2005-01-01&rft.pub=Springer&rft.isbn=9780387954332&rft_id=info:doi/10.1007%2F978-0-387-34555-0&rft.externalDocID=BA70030520
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fmedia.springernature.com%2Fw306%2Fspringer-static%2Fcover-hires%2Fbook%2F978-0-387-34555-0