Collaborative annotation for reliable natural language processing : technical and sociological aspects

This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP). NLP has witnessed two major evolutions in the past 25 years: firstly, the extraordinary success of machine learning, which is now, for better or for w...

Full description

Saved in:

Bibliographic Details
Main Author	Fort, Karën
Format	eBook Book
Language	English
Published	London ISTE 2016 Hoboken, N.J J. Wiley & Sons John Wiley & Sons, Incorporated Wiley-Blackwell Wiley-ISTE
Edition	1
Subjects	Computer Science Document and Text Processing Natural language processing (Computer science) annotation inter-annotator agreement crowdsourcing ethics
Online Access	Get full text

Cover

Loading…

Abstract	This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP). NLP has witnessed two major evolutions in the past 25 years: firstly, the extraordinary success of machine learning, which is now, for better or for worse, overwhelmingly dominant in the field, and secondly, the multiplication of evaluation campaigns or shared tasks. Both involve manually annotated corpora, for the training and evaluation of the systems. These corpora have progressively become the hidden pillars of our domain, providing food for our hungry machine learning algorithms and reference for evaluation. Annotation is now the place where linguistics hides in NLP. However, manual annotation has largely been ignored for some time, and it has taken a while even for annotation guidelines to be recognized as essential. Although some efforts have been made lately to address some of the issues presented by manual annotation, there has still been little research done on the subject. This book aims to provide some useful insights into the subject. Manual corpus annotation is now at the heart of NLP, and is still largely unexplored. There is a need for manual annotation engineering (in the sense of a precisely formalized process), and this book aims to provide a first step towards a holistic methodology, with a global view on annotation.
AbstractList	This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP). NLP has witnessed two major evolutions in the past 25 years: firstly, the extraordinary success of machine learning, which is now, for better or for worse, overwhelmingly dominant in the field, and secondly, the multiplication of evaluation campaigns or shared tasks. Both involve manually annotated corpora, for the training and evaluation of the systems. These corpora have progressively become the hidden pillars of our domain, providing food for our hungry machine learning algorithms and reference for evaluation. Annotation is now the place where linguistics hides in NLP. However, manual annotation has largely been ignored for some time, and it has taken a while even for annotation guidelines to be recognized as essential. Although some efforts have been made lately to address some of the issues presented by manual annotation, there has still been little research done on the subject. This book aims to provide some useful insights into the subject. Manual corpus annotation is now at the heart of NLP, and is still largely unexplored. There is a need for manual annotation engineering (in the sense of a precisely formalized process), and this book aims to provide a first step towards a holistic methodology, with a global view on annotation.
Author	Fort, Karën
Author_xml	– sequence: 1 fullname: Fort, Karën
BackLink	https://cir.nii.ac.jp/crid/1130282273284077312$$DView record in CiNii https://hal.science/hal-01324322$$DView record in HAL
BookMark	eNqNkEtvEzEUhY2giKZ0yX4WSIhF6L1-m10bFVopUjeI7eiO40lMjR3Gk1T8eyYMgi2b-_x0pHMW7EUuOTD2BuEDAvArZywiOgFaO_2MLebFaCmeT4uVlqMDCWdswQG1E1oDf8nOnUILzir9il3W-g0A0HInFZyzflVSoq4MNMZjaCjnMk5jyU1fhmYIKVKXQpNpPAyUmkR5e6BtaPZD8aHWmLfNx2YMfpejn_6UN00tPpZUtvOh7oMf62t21lOq4fJPv2BfP91-Wd0t1w-f71fX6yUpFM4uO-gRNn3vbC85V4DGWxU8UdeRsXqy13XCbHqSoI0PrlNCBeVQTfYckhMX7P0svKPU7of4nYafbaHY3l2v29MNUHApOD_iP5bqY3iqu5LG2h5T6Ep5rO3frI1W6v9ZaSf23cxOIf04hDq2vzEf8jhl2N7erKRSFvlJ9e1M5hhbH08VUQC3nBvBrQRjBHLxC27clG4
ContentType	eBook Book
Copyright	Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml	– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID	RYH 1XC VOOES
DEWEY	006.3/5
DOI	10.1002/9781119306696
DatabaseName	CiNii Complete Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	1119307643 9781119307648 9781119307655 1119307651
Edition	1 1st edition.
ExternalDocumentID	oai_HAL_hal_01324322v1 9781119307655 9781119307648 EBC4558125 BB22489608
GroupedDBID	20A 38. 3XM AAAUZ AABBV AARDG ABARN ABBFG ABIAV ABQPQ ABQPW ACGYG ACLGV ACNAM ACNUM ADJGH ADVEM AERYV AFOJC AFPKT AHWGJ AJFER AKQZE ALMA_UNASSIGNED_HOLDINGS AMYDA ASVIU AZZ BASKQ BBABE BPBUR CZZ DFSMB GEOUK IEZ IPJKO J-X JFSCD LQKAK LWYJN LYPXV MFNMN MPPRW MUPLJ MYL OHILO OHSWP OODEK OTAXI PQQKQ RYH W1A WIIVT XWAVR YPLAZ ZEEST 1XC VOOES
ID	FETCH-LOGICAL-a51398-b0f10dff98f4225017c85ecaabba786219bb37dfa4067ce9b535e591509891a93
ISBN	1848219040 9781848219045
IngestDate	Fri May 09 12:23:57 EDT 2025 Mon Feb 10 07:36:07 EST 2025 Fri Nov 08 06:04:43 EST 2024 Wed Aug 27 04:37:44 EDT 2025 Thu Jun 26 23:36:36 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Keywords	annotation inter-annotator agreement crowdsourcing ethics
LCCN	2016936602
LCCallNum_Ident	QA76.9.N38 .F678 2016
Language	English
License	Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-a51398-b0f10dff98f4225017c85ecaabba786219bb37dfa4067ce9b535e591509891a93
Notes	Bibliography: p. [143]-162 Includes index
OCLC	951809856
ORCID	0000-0002-0723-8850
OpenAccessLink	https://hal.science/hal-01324322
PQID	EBC4558125
PageCount	196
ParticipantIDs	hal_primary_oai_HAL_hal_01324322v1 askewsholts_vlebooks_9781119307655 askewsholts_vlebooks_9781119307648 proquest_ebookcentral_EBC4558125 nii_cinii_1130282273284077312
PublicationCentury	2000
PublicationDate	2016 2016-06-14 2016-07-01
PublicationDateYYYYMMDD	2016-01-01 2016-06-14 2016-07-01
PublicationDate_xml	– year: 2016 text: 2016
PublicationDecade	2010
PublicationPlace	London Hoboken, N.J
PublicationPlace_xml	– name: London – name: Hoboken, N.J – name: Newark
PublicationYear	2016
Publisher	ISTE J. Wiley & Sons John Wiley & Sons, Incorporated Wiley-Blackwell Wiley-ISTE
Publisher_xml	– name: ISTE – name: J. Wiley & Sons – name: John Wiley & Sons, Incorporated – name: Wiley-Blackwell – name: Wiley-ISTE
SSID	ssj0001829450 ssib039627691 ssib037090189
Score	2.2525396
Snippet	This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP). NLP has...
SourceID	hal askewsholts proquest nii
SourceType	Open Access Repository Aggregation Database Publisher
SubjectTerms	Computer Science Document and Text Processing Natural language processing (Computer science)
TableOfContents	Cover -- Title Page -- Copyright -- Contents -- Preface -- List of Acronyms -- Introduction -- I.1. Natural Language Processing and manual annotation: Dr Jekyll and Mr Hy\|ide? -- I.1.1. Where linguistics hides -- I.1.2. What is annotation? -- I.1.3. New forms, old issues -- I.2. Rediscovering annotation -- I.2.1. A rise in diversity and complexity -- I.2.2. Redefining manual annotation costs -- 1: Annotating Collaboratively -- 1.1. The annotation process (re)visited -- 1.1.1. Building consensus -- 1.1.2. Existing methodologies -- 1.1.3. Preparatory work -- 1.1.3.1. Identifying the actors -- 1.1.3.2. Taking the corpus into account -- 1.1.3.3. Creating and modifying the annotation guide -- 1.1.4. Pre-campaign -- 1.1.4.1. Building the mini-reference -- 1.1.4.2. Training the annotators -- 1.1.5. Annotation -- 1.1.5.1. Breaking-in -- 1.1.5.2. Annotating -- 1.1.5.3. Updating -- 1.1.6. Finalization -- 1.1.6.1. Failure -- 1.1.6.2. Adjudication -- 1.1.6.3. Reviewing -- 1.1.6.4. Publication -- 1.2. Annotation complexity -- 1.2.1. Example overview -- 1.2.1.1. Example 1: POS -- 1.2.1.2. Example 2: gene renaming -- 1.2.1.3. Example 3: structured named entities -- 1.2.2. What to annotate? -- 1.2.2.1. Discrimination -- 1.2.2.2. Delimitation -- 1.2.3. How to annotate? -- 1.2.3.1. Expressiveness of the annotation language -- 1.2.3.2. Tagset dimension -- 1.2.3.3. Degree of ambiguity -- 1.2.3.3.1. Residual ambiguity -- 1.2.3.3.2. Theoretical ambiguity -- 1.2.4. The weight of the context -- 1.2.5. Visualization -- 1.2.6. Elementary annotation tasks -- 1.2.6.1. Identifying gene names -- 1.2.6.2. Annotating gene renaming relations -- 1.3. Annotation tools -- 1.3.1. To be or not to be an annotation tool -- 1.3.2. Much more than prototypes -- 1.3.2.1. Taking the annotators into account -- 1.3.2.2. Standardizing the formalisms 1.3.3. Addressing the new annotation challenges -- 1.3.3.1. Towards more flexible and more generic tools -- 1.3.3.2. Towards more collaborative annotation -- 1.3.3.3. Towards the annotation campaign management -- 1.3.4. The impossible dream tool -- 1.4. Evaluating the annotation quality -- 1.4.1. What is annotation quality? -- 1.4.2. Understanding the basics -- 1.4.2.1. How lucky can you get? -- 1.4.2.2. The kappa family -- 1.4.2.2.1. Scott's pi -- 1.4.2.2.2. Cohen's kappa -- 1.4.2.3. The dark side of kappas -- 1.4.2.4. The F-measure: proceed with caution -- 1.4.3. Beyond kappas -- 1.4.3.1. Weighted coefficients -- 1.4.3.2. γ: the (nearly) universal metrics -- 1.4.4. Giving meaning to the metrics -- 1.4.4.1. The Corpus Shuffling Tool -- 1.4.4.2. Experimental results -- 1.4.4.2.1. Artificial annotations -- 1.4.4.2.2. Annotations from a real corpus -- 1.5. Conclusion -- 2: Crowdsourcing Annotation -- 2.1. What is crowdsourcing and why should we be interested in it? -- 2.1.1. A moving target -- 2.1.2. A massive success -- 2.2. Deconstructing the myths -- 2.2.1. Crowdsourcing is a recent phenomenon -- 2.2.2. Crowdsourcing involves a crowd (of non-experts) -- 2.2.3. "Crowdsourcing involves (a crowd of) non-experts" -- 2.3. Playing with a purpose -- 2.3.1. Using the players' innate capabilities and world knowledge -- 2.3.2. Using the players' school knowledge -- 2.3.3. Using the players' learning capacities -- 2.4. Acknowledging crowdsourcing specifics -- 2.4.1. Motivating the participants -- 2.4.2. Producing quality data -- 2.5. Ethical issues -- 2.5.1. Game ethics -- 2.5.2. What's wrong with Amazon Mechanical Turk? -- 2.5.3. A charter to rule them all -- Conclusion -- Appendix: (Some) Annotation Tools -- A.1. Generic tools -- A.1.1. Cadixe -- A.1.2. Callisto -- A.1.3. Amazon Mechanical Turk -- A.1.4. Knowtator -- A.1.5. MMAX2 -- A.1.6. UAM CorpusTool A.1.7. Glozz -- A.1.8. CCASH -- A.1.9. brat -- A.2. Task-oriented tools -- A.2.1. LDC tools -- A.2.2. EasyRef -- A.2.3. Phrase Detectives -- A.2.4. ZombiLingo -- A.3. NLP annotation platforms -- A.3.1. GATE -- A.3.2. EULIA -- A.3.3. UIMA -- A.3.4. SYNC3 -- A.4. Annotation management tools -- A.4.1. Slate -- A.4.2. Djangology -- A.4.3. GATE Teamware -- A.4.4. WebAnno -- A.5. (Many) Other tools -- Glossary -- Bibliography -- Index -- Other titles from ISTE in Cognitive Science and Knowledge Management -- ELUA
Title	Collaborative annotation for reliable natural language processing : technical and sociological aspects
URI	https://cir.nii.ac.jp/crid/1130282273284077312 https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=4558125 https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781119307648&uid=none https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781119307655 https://hal.science/hal-01324322
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pa9swFH6syWW9tFs7lv4YYuzqNrIl2zq2JSOMtqdu9GYkRWZhwx2Ll8P--n6y5ThJC_txEbYQMuiTv_f0pPeJ6IOUTmqrbWR46RcoIo8M7GyUpeNMW5UoXfo45M1tOv0sPt3L-_520Ca7pDZn9vezeSX_gyrqgKvPkv0HZFedogLPwBclEEa55fyuXoOmQI_e0guLVA9rhwb9KeMmIepWt6Ia1yEm2WUFdNaqvZyjDjlhzZb5ZbUeBuDbYYBnztkEkgmCyKtUqXblCIpT-L3TVubyCY-2uqyrdmmqtvSqGws4ubwSUsJFkDs0jEWaiAENYU4nN32YK4-VkOOgbopuzzc63aVdvfgGOgfV1wvY96_-OOpONZ8_sY2Nwb_bp6HzWSCv6IWrXtNed_cFC1R4QBcbELAeAgYIWAcBCxCwDgLWQ3BIXz5O7q6mUbiOItISfrKfxCUfz8pS5aUADYLLbC6d1doYnWFlyJUxSTYrNZykzDplZII_QcHlVrniWiVvaFA9VO4tMZHo3ObcCVMaMYs1qD43s1g4kSqV2XRE79fGpVh-b7bOF8UGcn_RSEo0wpgWP1rtksKriU8vrgtf57fZBAh9yUd0iiEv7NyX3O9lw2_0Mk5Y7mcJj0fEOjCK5iPh5HDRz4CjPzc5ppf9zD2hQf3zlzuFS1ebd2HaPAI13UZo
linkProvider	ProQuest Ebooks
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Collaborative+Annotation+for+Reliable+Natural+Language+Processing&rft.au=Fort%2C+Kar%C3%ABn&rft.date=2016-01-01&rft.pub=John+Wiley+%26+Sons%2C+Incorporated&rft.isbn=9781119307648&rft_id=info:doi/10.1002%2F9781119306696&rft.externalDocID=EBC4558125
thumbnail_m	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97811193%2F9781119307648.jpg http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97811193%2F9781119307655.jpg