Text Mining Predictive Methods for Analyzing Unstructured Information
The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly available information. Huge amounts of information are private and are of interest to local communities, such as the records of customers of a b...
Saved in:
Main Authors | , , , |
---|---|
Format | eBook Book |
Language | English |
Published |
New York, NY
Springer-Verlag
2004
Springer Springer New York |
Edition | 1. Aufl. |
Subjects | |
Online Access | Get full text |
ISBN | 0387954333 9780387954332 |
DOI | 10.1007/978-0-387-34555-0 |
Cover
Loading…
Abstract | The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly available information. Huge amounts of information are private and are of interest to local communities, such as the records of customers of a business. This information is overwhelmingly text and has its record-keeping purpose, but an automated analysis might be desirable to find patterns in the stored records. Analogous to this data mining is text mining, which also finds patterns and trends in information samples but which does so with far less structured--though with greater immediate utility for users--ingredients. This book focuses on the concepts and methods needed to expand horizons beyond structured, numeric data to automated mining of text samples. It introduces the new world of text mining and examines proven methods for various critical text-mining tasks, such as automated document indexing and information retrieval and search. New research areas are explored, such as information extraction and document summarization, that rely on evolving text-mining techniques. TOC:Overview of text mining.- From textual information to numerical vectors.- Using text for prediction.- Information retrieval and text mining.- Finding structure in a document collection.- Looking for information in documents.- Case studies.- Emerging directions.- Appendix: software notes.- References.- Author and subject indexes. |
---|---|
AbstractList | The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly available information. Huge amounts of information are private and are of interest to local communities, such as the records of customers of a business. This information is overwhelmingly text and has its record-keeping purpose, but an automated analysis might be desirable to find patterns in the stored records. Analogous to this data mining is text mining, which also finds patterns and trends in information samples but which does so with far less structured--though with greater immediate utility for users--ingredients. This book focuses on the concepts and methods needed to expand horizons beyond structured, numeric data to automated mining of text samples. It introduces the new world of text mining and examines proven methods for various critical text-mining tasks, such as automated document indexing and information retrieval and search. New research areas are explored, such as information extraction and document summarization, that rely on evolving text-mining techniques. TOC:Overview of text mining.- From textual information to numerical vectors.- Using text for prediction.- Information retrieval and text mining.- Finding structure in a document collection.- Looking for information in documents.- Case studies.- Emerging directions.- Appendix: software notes.- References.- Author and subject indexes. |
Author | Indurkhya, Nitin Zhang, Tong Weiss, Sholom M Damerau, Fred |
Author_xml | – sequence: 1 fullname: Damerau, Fred – sequence: 2 fullname: Indurkhya, Nitin – sequence: 3 fullname: Weiss, Sholom M – sequence: 4 fullname: Zhang, Tong |
BackLink | https://cir.nii.ac.jp/crid/1130282272545633024$$DView record in CiNii |
BookMark | eNotj81OwzAQhI2giLb0AbiBhLiFrne9tnOkVfmRirhUXC3XdVCgJCUpErw9ToMPa83oG3tnJE6quopCXEi4lQBmmhubQUbWZKSYOYMjMUkeJOdgwLEYdSJnRUQDMUIABawZ1akYMjKgyo06E5O2fYd0CIGUGYrxKv7sL5_LqqzezsWg8Ns2Tv7vsXi9X6zmj9ny5eFpfrfMvFYWOPOFCdYHjNIWbBVpJGt1EVhJ70kVYDc5SA7BS_AyGrNZmwKZ1holW4k0FtP-4XbXpH9j49Z1_dE6Ca5r61IzBy7VcYdyDlLipk_smvrrO7Z7F7tIiNW-8Vu3mM1ZysQm8LoHq7J0oeymlARoEQ2yYk1JqIRd9VgofbWpXdrj0ze_rpcKtbb0B7JCY1Y |
ContentType | eBook Book |
Copyright | Springer-Verlag New York 2005 |
Copyright_xml | – notice: Springer-Verlag New York 2005 |
DBID | 08O RYH |
DEWEY | 006.312 |
DOI | 10.1007/978-0-387-34555-0 |
DatabaseName | ciando eBooks CiNii Complete |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9780387345550 0387345558 |
Edition | 1. Aufl. 1 |
ExternalDocumentID | 74036 EBC511455 BA70030520 ciando42668 |
GroupedDBID | -T. 08O 0D6 0DA 0E8 20A 38. 64P AABBV AABSQ AAIER AAJYQ AATVQ AAUKK ABBRO ABBUY ABCYT ABMNI ACAMX ACBPT ACDTA ACDUY ADVHH AEHEY AEJLV AEKFX AETDV AEZAY AFUVA AHMWK AHNNE ALMA_UNASSIGNED_HOLDINGS ATJMZ AZZ BBABE CZZ E6I IEZ JJU LTD MYL NUD SBO SUFPE SVJCK TPJZQ UR3 Z5O Z7R Z7U Z7W Z7X Z7Z Z81 Z83 Z84 Z85 Z87 Z88 RYH |
ID | FETCH-LOGICAL-a64805-af7c8ac2e18f5843623886fc541aa34f08d9015cca10a1e77db7f253b62158123 |
ISBN | 0387954333 9780387954332 |
IngestDate | Tue Jul 29 19:53:24 EDT 2025 Fri May 30 21:09:56 EDT 2025 Fri Jun 27 00:08:35 EDT 2025 Mon Jun 30 03:59:41 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Keywords | Computer Informatik |
LCCN | 2004056524 |
LCCallNum_Ident | QA76.9.D343 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-a64805-af7c8ac2e18f5843623886fc541aa34f08d9015cca10a1e77db7f253b62158123 |
Notes | Includes bibliographical references (p. [217]-228) and indexes |
OCLC | 525024974 |
PQID | EBC511455 |
PageCount | 247 |
ParticipantIDs | springer_books_10_1007_978_0_387_34555_0 proquest_ebookcentral_EBC511455 nii_cinii_1130282272545633024 ciando_primary_ciando42668 |
PublicationCentury | 2000 |
PublicationDate | 2004 c2005 2005 |
PublicationDateYYYYMMDD | 2004-01-01 2005-01-01 |
PublicationDate_xml | – year: 2004 text: 2004 |
PublicationDecade | 2000 |
PublicationPlace | New York, NY |
PublicationPlace_xml | – name: New York – name: New York, NY |
PublicationYear | 2004 2005 |
Publisher | Springer-Verlag Springer Springer New York |
Publisher_xml | – name: Springer-Verlag – name: Springer – name: Springer New York |
SSID | ssj0000320347 |
Score | 2.52048 |
Snippet | The growth of the web can be seen as an expanding public digital library collection. Online digital information extends far beyond the web and its publicly... |
SourceID | springer proquest nii ciando |
SourceType | Publisher |
SubjectTerms | Computer Appl. in Administrative Data Processing Computer Science Data mining Data Mining and Knowledge Discovery Database Management Information Storage and Retrieval Information Systems and Communication Service Natural Language Processing (NLP) |
Subtitle | Predictive Methods for Analyzing Unstructured Information |
TableOfContents | 4.4 Measuring Similarity -- 4.4.1 Shared Word Count -- 4.4.2 Word Count and Bonus -- 4.4.3 Cosine Similarity -- 4.5 Web-Based Document Search -- 4.5.1 Link Analysis -- 4.6 Document Matching -- 4.7 Inverted Lists -- 4.8 Evaluation of Performance -- 4.9 Historical and Bibliographical Remarks -- 5 Finding Structure in a Document Collection -- 5.1 Clustering Documents by Similarity -- 5.2 Similarity of Composite Documents -- 5.2.1 k-Means Clustering -- 5.2.1.1 Centroid Classifier -- 5.2.2 Hierarchical Clustering -- 5.2.3 The EM Algorithm -- 5.3 What Do a Cluster's Labels Mean? -- 5.4 Applications -- 5.5 Evaluation of Performance -- 5.6 Historical and Bibliographical Remarks -- 6 Looking for Information in Documents -- 6.1 Goals of Information Extraction -- 6.2 Finding Patterns and Entities from Text -- 6.2.1 Entity Extraction as Sequential Tagging -- 6.2.2 Tag Prediction as Classification -- 6.2.3 The Maximum Entropy Method -- 6.2.4 Linguistic Features and Encoding -- 6.2.5 Sequential Probability Model -- 6.3 Coreference and Relationship Extraction -- 6.3.1 Coreference Resolution -- 6.3.2 Relationship Extraction -- 6.4 Template Filling and Database Construction -- 6.5 Applications -- 6.5.1 Information Retrieval -- 6.5.2 Commercial Extraction Systems -- 6.5.3 Criminal Justice -- 6.5.4 Intelligence -- 6.6 Historical and Bibliographical Remarks -- 7 Case Studies -- 7.1 Market Intelligence from the Web -- 7.2 Lightweight Document Matching for Digital Libraries -- 7.3 Generating Model Cases for Help Desk Applications -- 7.4 Assigning Topics to News Articles -- 7.5 E-mail Filtering -- 7.6 Search Engines -- 7.7 Extracting Named Entities from Documents -- 7.8 Customized Newspapers -- 7.9 Historical and Bibliographical Remarks -- 8 Emerging Directions -- 8.1 Summarization -- 8.2 Active Learning -- 8.3 Learning with Unlabeled Data 8.4 Different Ways of Collecting Samples -- 8.4.1 Multiple Samples and Voting Methods -- 8.4.2 Online Learning -- 8.4.3 Cost-Sensitive Learning -- 8.4.4 Unbalanced Samples and Rare Events -- 8.5 Question Answering -- 8.6 Historical and Bibliographical Remarks -- Appendix: Software Notes -- A.1 Summary of Software -- A.2 Requirements -- A.3 Download Instructions -- References -- Author Index -- Subject Index Intro -- CONTENTS -- Preface -- 1 Overview of Text Mining -- 1.1 What's Special about Text Mining? -- 1.1.1 Structured or Unstructured Data? -- 1.1.2 Is Text Different from Numbers? -- 1.2 What Types of Problems Can Be Solved? -- 1.3 Document Classification -- 1.4 Information Retrieval -- 1.5 Clustering and Organizing Documents -- 1.6 Information Extraction -- 1.7 Prediction and Evaluation -- 1.8 The Next Chapters -- 1.9 Historical and Bibliographical Remarks -- 2 From Textual Information to Numerical Vectors -- 2.1 Collecting Documents -- 2.2 Document Standardization -- 2.3 Tokenization -- 2.4 Lemmatization -- 2.4.1 Inflectional Stemming -- 2.4.2 Stemming to a Root -- 2.5 Vector Generation for Prediction -- 2.5.1 Multiword Features -- 2.5.2 Labels for the Right Answers -- 2.5.3 Feature Selection by Attribute Ranking -- 2.6 Sentence Boundary Determination -- 2.7 Part-Of-Speech Tagging -- 2.8 Word Sense Disambiguation -- 2.9 Phrase Recognition -- 2.10 Named Entity Recognition -- 2.11 Parsing -- 2.12 Feature Generation -- 2.13 Historical and Bibliographical Remarks -- 3 Using Text for Prediction -- 3.1 Recognizing that Documents Fit a Pattern -- 3.2 How Many Documents Are Enough? -- 3.3 Document Classification -- 3.4 Learning to Predict from Text -- 3.4.1 Similarity and Nearest-Neighbor Methods -- 3.4.2 Document Similarity -- 3.4.3 Decision Rules -- 3.4.3.1 How to Find the Best Decision Rules -- 3.4.4 Scoring by Probabilities -- 3.4.5 Linear Scoring Methods -- 3.4.5.1 How to Find the Best Scoring Model -- 3.5 Evaluation of Performance -- 3.5.1 Estimating Current and Future Performance -- 3.5.2 Getting the Most from a Learning Method -- 3.6 Applications -- 3.7 Historical and Bibliographical Remarks -- 4 Information Retrieval and Text Mining -- 4.1 Is Information Retrieval a Form of Text Mining? -- 4.2 Key Word Search -- 4.3 Nearest-Neighbor Methods |
Title | Text Mining |
URI | http://ebooks.ciando.com/book/index.cfm/bok_id/42668 https://cir.nii.ac.jp/crid/1130282272545633024 https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=511455 http://link.springer.com/10.1007/978-0-387-34555-0 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA66Xjz5xvWZgwdBKm2SJl1vKqsirheft5C2KSzo7uLjoL_emTRtd5cF0UtoQmnKfO3ky8xkhpADaUVHskwGkVAiECYNA8MZDzjLQxtmlil3PLp3K68exPVz_NxUIHSnSz7S4-x75rmS_6AKY4ArnpL9A7L1Q2EArgFfaAFhaKfIb9314IJKPXp1tR3cnn70hv4WFwVUloR-99GR5uXrG2_69IliPzHc3CdLHXfBP9l-WT39DrXh61HveMIcEE-ZAypz4MQ2EV3UnRgzlc1Umk2cRJlql4s4huc2K0Qdt3d2qtymg4XzZF4p0CILp93rm8faqoXl2LlQ3i3u5uQ-zVH9DpVveSK9r58TeAAac_IhLPWDfn-C9k95qh0BuF8mLTwUskLm7GCVLFWlMKjXjGvkEhGhJSL0hDZ4UI8HBZHTGg86jgcdw2OdPF5078-vAl-jIjBSJCB-U6gsMRmzUVIAmQM-wJNEFlksImO4KMIkR8oFP0oUmsgqlaeqYDFPJZAtYFd8g7QGw4HdJDSzkYm4tIkVTOSwErGMMQNK0xgrO6lsk61SOHpUZiLRZRdJVtImuyAwGME2Qnc0UD_FkCBz6Ig22a9EqZ0f3gf_6u7ZOZBuEH-bHFYS1njDu65SWgNMOtQAk3Yw6XDrl8m2yWLzbe6QFkjU7gJ5-0j3_BfzAwWqN_k |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Text+mining+%3A+predictive+methods+for+analyzing+unstructured+information&rft.au=Weiss%2C+Sholom+M.&rft.date=2005-01-01&rft.pub=Springer&rft.isbn=9780387954332&rft_id=info:doi/10.1007%2F978-0-387-34555-0&rft.externalDocID=BA70030520 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fmedia.springernature.com%2Fw306%2Fspringer-static%2Fcover-hires%2Fbook%2F978-0-387-34555-0 |