Collaborative annotation for reliable natural language processing : technical and sociological aspects

This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP). NLP has witnessed two major evolutions in the past 25 years: firstly, the extraordinary success of machine learning, which is now, for better or for w...

Full description

Saved in:

Bibliographic Details
Main Author	Fort, Karën
Format	eBook Book
Language	English
Published	London ISTE 2016 Hoboken, N.J J. Wiley & Sons John Wiley & Sons, Incorporated Wiley-Blackwell Wiley-ISTE
Edition	1
Subjects	Computer Science Document and Text Processing Natural language processing (Computer science) annotation inter-annotator agreement crowdsourcing ethics
Online Access	Get full text

Cover

Loading…

Table of Contents:

Cover -- Title Page -- Copyright -- Contents -- Preface -- List of Acronyms -- Introduction -- I.1. Natural Language Processing and manual annotation: Dr Jekyll and Mr Hy|ide? -- I.1.1. Where linguistics hides -- I.1.2. What is annotation? -- I.1.3. New forms, old issues -- I.2. Rediscovering annotation -- I.2.1. A rise in diversity and complexity -- I.2.2. Redefining manual annotation costs -- 1: Annotating Collaboratively -- 1.1. The annotation process (re)visited -- 1.1.1. Building consensus -- 1.1.2. Existing methodologies -- 1.1.3. Preparatory work -- 1.1.3.1. Identifying the actors -- 1.1.3.2. Taking the corpus into account -- 1.1.3.3. Creating and modifying the annotation guide -- 1.1.4. Pre-campaign -- 1.1.4.1. Building the mini-reference -- 1.1.4.2. Training the annotators -- 1.1.5. Annotation -- 1.1.5.1. Breaking-in -- 1.1.5.2. Annotating -- 1.1.5.3. Updating -- 1.1.6. Finalization -- 1.1.6.1. Failure -- 1.1.6.2. Adjudication -- 1.1.6.3. Reviewing -- 1.1.6.4. Publication -- 1.2. Annotation complexity -- 1.2.1. Example overview -- 1.2.1.1. Example 1: POS -- 1.2.1.2. Example 2: gene renaming -- 1.2.1.3. Example 3: structured named entities -- 1.2.2. What to annotate? -- 1.2.2.1. Discrimination -- 1.2.2.2. Delimitation -- 1.2.3. How to annotate? -- 1.2.3.1. Expressiveness of the annotation language -- 1.2.3.2. Tagset dimension -- 1.2.3.3. Degree of ambiguity -- 1.2.3.3.1. Residual ambiguity -- 1.2.3.3.2. Theoretical ambiguity -- 1.2.4. The weight of the context -- 1.2.5. Visualization -- 1.2.6. Elementary annotation tasks -- 1.2.6.1. Identifying gene names -- 1.2.6.2. Annotating gene renaming relations -- 1.3. Annotation tools -- 1.3.1. To be or not to be an annotation tool -- 1.3.2. Much more than prototypes -- 1.3.2.1. Taking the annotators into account -- 1.3.2.2. Standardizing the formalisms
1.3.3. Addressing the new annotation challenges -- 1.3.3.1. Towards more flexible and more generic tools -- 1.3.3.2. Towards more collaborative annotation -- 1.3.3.3. Towards the annotation campaign management -- 1.3.4. The impossible dream tool -- 1.4. Evaluating the annotation quality -- 1.4.1. What is annotation quality? -- 1.4.2. Understanding the basics -- 1.4.2.1. How lucky can you get? -- 1.4.2.2. The kappa family -- 1.4.2.2.1. Scott's pi -- 1.4.2.2.2. Cohen's kappa -- 1.4.2.3. The dark side of kappas -- 1.4.2.4. The F-measure: proceed with caution -- 1.4.3. Beyond kappas -- 1.4.3.1. Weighted coefficients -- 1.4.3.2. γ: the (nearly) universal metrics -- 1.4.4. Giving meaning to the metrics -- 1.4.4.1. The Corpus Shuffling Tool -- 1.4.4.2. Experimental results -- 1.4.4.2.1. Artificial annotations -- 1.4.4.2.2. Annotations from a real corpus -- 1.5. Conclusion -- 2: Crowdsourcing Annotation -- 2.1. What is crowdsourcing and why should we be interested in it? -- 2.1.1. A moving target -- 2.1.2. A massive success -- 2.2. Deconstructing the myths -- 2.2.1. Crowdsourcing is a recent phenomenon -- 2.2.2. Crowdsourcing involves a crowd (of non-experts) -- 2.2.3. "Crowdsourcing involves (a crowd of) non-experts" -- 2.3. Playing with a purpose -- 2.3.1. Using the players' innate capabilities and world knowledge -- 2.3.2. Using the players' school knowledge -- 2.3.3. Using the players' learning capacities -- 2.4. Acknowledging crowdsourcing specifics -- 2.4.1. Motivating the participants -- 2.4.2. Producing quality data -- 2.5. Ethical issues -- 2.5.1. Game ethics -- 2.5.2. What's wrong with Amazon Mechanical Turk? -- 2.5.3. A charter to rule them all -- Conclusion -- Appendix: (Some) Annotation Tools -- A.1. Generic tools -- A.1.1. Cadixe -- A.1.2. Callisto -- A.1.3. Amazon Mechanical Turk -- A.1.4. Knowtator -- A.1.5. MMAX2 -- A.1.6. UAM CorpusTool
A.1.7. Glozz -- A.1.8. CCASH -- A.1.9. brat -- A.2. Task-oriented tools -- A.2.1. LDC tools -- A.2.2. EasyRef -- A.2.3. Phrase Detectives -- A.2.4. ZombiLingo -- A.3. NLP annotation platforms -- A.3.1. GATE -- A.3.2. EULIA -- A.3.3. UIMA -- A.3.4. SYNC3 -- A.4. Annotation management tools -- A.4.1. Slate -- A.4.2. Djangology -- A.4.3. GATE Teamware -- A.4.4. WebAnno -- A.5. (Many) Other tools -- Glossary -- Bibliography -- Index -- Other titles from ISTE in Cognitive Science and Knowledge Management -- ELUA