Collaborative annotation for reliable natural language processing : technical and sociological aspects
This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP). NLP has witnessed two major evolutions in the past 25 years: firstly, the extraordinary success of machine learning, which is now, for better or for w...
Saved in:
Main Author | |
---|---|
Format | eBook Book |
Language | English |
Published |
London
ISTE
2016
Hoboken, N.J J. Wiley & Sons John Wiley & Sons, Incorporated Wiley-Blackwell Wiley-ISTE |
Edition | 1 |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP). NLP has witnessed two major evolutions in the past 25 years: firstly, the extraordinary success of machine learning, which is now, for better or for worse, overwhelmingly dominant in the field, and secondly, the multiplication of evaluation campaigns or shared tasks. Both involve manually annotated corpora, for the training and evaluation of the systems. These corpora have progressively become the hidden pillars of our domain, providing food for our hungry machine learning algorithms and reference for evaluation. Annotation is now the place where linguistics hides in NLP. However, manual annotation has largely been ignored for some time, and it has taken a while even for annotation guidelines to be recognized as essential. Although some efforts have been made lately to address some of the issues presented by manual annotation, there has still been little research done on the subject. This book aims to provide some useful insights into the subject. Manual corpus annotation is now at the heart of NLP, and is still largely unexplored. There is a need for manual annotation engineering (in the sense of a precisely formalized process), and this book aims to provide a first step towards a holistic methodology, with a global view on annotation. |
---|---|
AbstractList | This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP). NLP has witnessed two major evolutions in the past 25 years: firstly, the extraordinary success of machine learning, which is now, for better or for worse, overwhelmingly dominant in the field, and secondly, the multiplication of evaluation campaigns or shared tasks. Both involve manually annotated corpora, for the training and evaluation of the systems. These corpora have progressively become the hidden pillars of our domain, providing food for our hungry machine learning algorithms and reference for evaluation. Annotation is now the place where linguistics hides in NLP. However, manual annotation has largely been ignored for some time, and it has taken a while even for annotation guidelines to be recognized as essential. Although some efforts have been made lately to address some of the issues presented by manual annotation, there has still been little research done on the subject. This book aims to provide some useful insights into the subject. Manual corpus annotation is now at the heart of NLP, and is still largely unexplored. There is a need for manual annotation engineering (in the sense of a precisely formalized process), and this book aims to provide a first step towards a holistic methodology, with a global view on annotation. |
Author | Fort, Karën |
Author_xml | – sequence: 1 fullname: Fort, Karën |
BackLink | https://cir.nii.ac.jp/crid/1130282273284077312$$DView record in CiNii https://hal.science/hal-01324322$$DView record in HAL |
BookMark | eNqNkEtvEzEUhY2giKZ0yX4WSIhF6L1-m10bFVopUjeI7eiO40lMjR3Gk1T8eyYMgi2b-_x0pHMW7EUuOTD2BuEDAvArZywiOgFaO_2MLebFaCmeT4uVlqMDCWdswQG1E1oDf8nOnUILzir9il3W-g0A0HInFZyzflVSoq4MNMZjaCjnMk5jyU1fhmYIKVKXQpNpPAyUmkR5e6BtaPZD8aHWmLfNx2YMfpejn_6UN00tPpZUtvOh7oMf62t21lOq4fJPv2BfP91-Wd0t1w-f71fX6yUpFM4uO-gRNn3vbC85V4DGWxU8UdeRsXqy13XCbHqSoI0PrlNCBeVQTfYckhMX7P0svKPU7of4nYafbaHY3l2v29MNUHApOD_iP5bqY3iqu5LG2h5T6Ep5rO3frI1W6v9ZaSf23cxOIf04hDq2vzEf8jhl2N7erKRSFvlJ9e1M5hhbH08VUQC3nBvBrQRjBHLxC27clG4 |
ContentType | eBook Book |
Copyright | Distributed under a Creative Commons Attribution 4.0 International License |
Copyright_xml | – notice: Distributed under a Creative Commons Attribution 4.0 International License |
DBID | RYH 1XC VOOES |
DEWEY | 006.3/5 |
DOI | 10.1002/9781119306696 |
DatabaseName | CiNii Complete Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 1119307643 9781119307648 9781119307655 1119307651 |
Edition | 1 1st edition. |
ExternalDocumentID | oai_HAL_hal_01324322v1 9781119307655 9781119307648 EBC4558125 BB22489608 |
GroupedDBID | 20A 38. 3XM AAAUZ AABBV AARDG ABARN ABBFG ABIAV ABQPQ ABQPW ACGYG ACLGV ACNAM ACNUM ADJGH ADVEM AERYV AFOJC AFPKT AHWGJ AJFER AKQZE ALMA_UNASSIGNED_HOLDINGS AMYDA ASVIU AZZ BASKQ BBABE BPBUR CZZ DFSMB GEOUK IEZ IPJKO J-X JFSCD LQKAK LWYJN LYPXV MFNMN MPPRW MUPLJ MYL OHILO OHSWP OODEK OTAXI PQQKQ RYH W1A WIIVT XWAVR YPLAZ ZEEST 1XC VOOES |
ID | FETCH-LOGICAL-a51398-b0f10dff98f4225017c85ecaabba786219bb37dfa4067ce9b535e591509891a93 |
ISBN | 1848219040 9781848219045 |
IngestDate | Fri May 09 12:23:57 EDT 2025 Mon Feb 10 07:36:07 EST 2025 Fri Nov 08 06:04:43 EST 2024 Wed Aug 27 04:37:44 EDT 2025 Thu Jun 26 23:36:36 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Keywords | annotation inter-annotator agreement crowdsourcing ethics |
LCCN | 2016936602 |
LCCallNum_Ident | QA76.9.N38 .F678 2016 |
Language | English |
License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-a51398-b0f10dff98f4225017c85ecaabba786219bb37dfa4067ce9b535e591509891a93 |
Notes | Bibliography: p. [143]-162 Includes index |
OCLC | 951809856 |
ORCID | 0000-0002-0723-8850 |
OpenAccessLink | https://hal.science/hal-01324322 |
PQID | EBC4558125 |
PageCount | 196 |
ParticipantIDs | hal_primary_oai_HAL_hal_01324322v1 askewsholts_vlebooks_9781119307655 askewsholts_vlebooks_9781119307648 proquest_ebookcentral_EBC4558125 nii_cinii_1130282273284077312 |
PublicationCentury | 2000 |
PublicationDate | 2016 2016-06-14 2016-07-01 |
PublicationDateYYYYMMDD | 2016-01-01 2016-06-14 2016-07-01 |
PublicationDate_xml | – year: 2016 text: 2016 |
PublicationDecade | 2010 |
PublicationPlace | London Hoboken, N.J |
PublicationPlace_xml | – name: London – name: Hoboken, N.J – name: Newark |
PublicationYear | 2016 |
Publisher | ISTE J. Wiley & Sons John Wiley & Sons, Incorporated Wiley-Blackwell Wiley-ISTE |
Publisher_xml | – name: ISTE – name: J. Wiley & Sons – name: John Wiley & Sons, Incorporated – name: Wiley-Blackwell – name: Wiley-ISTE |
SSID | ssj0001829450 ssib039627691 ssib037090189 |
Score | 2.2525396 |
Snippet | This book presents a unique opportunity for constructing a consistent image of collaborative manual annotation for Natural Language Processing (NLP). NLP has... |
SourceID | hal askewsholts proquest nii |
SourceType | Open Access Repository Aggregation Database Publisher |
SubjectTerms | Computer Science Document and Text Processing Natural language processing (Computer science) |
TableOfContents | Cover -- Title Page -- Copyright -- Contents -- Preface -- List of Acronyms -- Introduction -- I.1. Natural Language Processing and manual annotation: Dr Jekyll and Mr Hy|ide? -- I.1.1. Where linguistics hides -- I.1.2. What is annotation? -- I.1.3. New forms, old issues -- I.2. Rediscovering annotation -- I.2.1. A rise in diversity and complexity -- I.2.2. Redefining manual annotation costs -- 1: Annotating Collaboratively -- 1.1. The annotation process (re)visited -- 1.1.1. Building consensus -- 1.1.2. Existing methodologies -- 1.1.3. Preparatory work -- 1.1.3.1. Identifying the actors -- 1.1.3.2. Taking the corpus into account -- 1.1.3.3. Creating and modifying the annotation guide -- 1.1.4. Pre-campaign -- 1.1.4.1. Building the mini-reference -- 1.1.4.2. Training the annotators -- 1.1.5. Annotation -- 1.1.5.1. Breaking-in -- 1.1.5.2. Annotating -- 1.1.5.3. Updating -- 1.1.6. Finalization -- 1.1.6.1. Failure -- 1.1.6.2. Adjudication -- 1.1.6.3. Reviewing -- 1.1.6.4. Publication -- 1.2. Annotation complexity -- 1.2.1. Example overview -- 1.2.1.1. Example 1: POS -- 1.2.1.2. Example 2: gene renaming -- 1.2.1.3. Example 3: structured named entities -- 1.2.2. What to annotate? -- 1.2.2.1. Discrimination -- 1.2.2.2. Delimitation -- 1.2.3. How to annotate? -- 1.2.3.1. Expressiveness of the annotation language -- 1.2.3.2. Tagset dimension -- 1.2.3.3. Degree of ambiguity -- 1.2.3.3.1. Residual ambiguity -- 1.2.3.3.2. Theoretical ambiguity -- 1.2.4. The weight of the context -- 1.2.5. Visualization -- 1.2.6. Elementary annotation tasks -- 1.2.6.1. Identifying gene names -- 1.2.6.2. Annotating gene renaming relations -- 1.3. Annotation tools -- 1.3.1. To be or not to be an annotation tool -- 1.3.2. Much more than prototypes -- 1.3.2.1. Taking the annotators into account -- 1.3.2.2. Standardizing the formalisms 1.3.3. Addressing the new annotation challenges -- 1.3.3.1. Towards more flexible and more generic tools -- 1.3.3.2. Towards more collaborative annotation -- 1.3.3.3. Towards the annotation campaign management -- 1.3.4. The impossible dream tool -- 1.4. Evaluating the annotation quality -- 1.4.1. What is annotation quality? -- 1.4.2. Understanding the basics -- 1.4.2.1. How lucky can you get? -- 1.4.2.2. The kappa family -- 1.4.2.2.1. Scott's pi -- 1.4.2.2.2. Cohen's kappa -- 1.4.2.3. The dark side of kappas -- 1.4.2.4. The F-measure: proceed with caution -- 1.4.3. Beyond kappas -- 1.4.3.1. Weighted coefficients -- 1.4.3.2. γ: the (nearly) universal metrics -- 1.4.4. Giving meaning to the metrics -- 1.4.4.1. The Corpus Shuffling Tool -- 1.4.4.2. Experimental results -- 1.4.4.2.1. Artificial annotations -- 1.4.4.2.2. Annotations from a real corpus -- 1.5. Conclusion -- 2: Crowdsourcing Annotation -- 2.1. What is crowdsourcing and why should we be interested in it? -- 2.1.1. A moving target -- 2.1.2. A massive success -- 2.2. Deconstructing the myths -- 2.2.1. Crowdsourcing is a recent phenomenon -- 2.2.2. Crowdsourcing involves a crowd (of non-experts) -- 2.2.3. "Crowdsourcing involves (a crowd of) non-experts" -- 2.3. Playing with a purpose -- 2.3.1. Using the players' innate capabilities and world knowledge -- 2.3.2. Using the players' school knowledge -- 2.3.3. Using the players' learning capacities -- 2.4. Acknowledging crowdsourcing specifics -- 2.4.1. Motivating the participants -- 2.4.2. Producing quality data -- 2.5. Ethical issues -- 2.5.1. Game ethics -- 2.5.2. What's wrong with Amazon Mechanical Turk? -- 2.5.3. A charter to rule them all -- Conclusion -- Appendix: (Some) Annotation Tools -- A.1. Generic tools -- A.1.1. Cadixe -- A.1.2. Callisto -- A.1.3. Amazon Mechanical Turk -- A.1.4. Knowtator -- A.1.5. MMAX2 -- A.1.6. UAM CorpusTool A.1.7. Glozz -- A.1.8. CCASH -- A.1.9. brat -- A.2. Task-oriented tools -- A.2.1. LDC tools -- A.2.2. EasyRef -- A.2.3. Phrase Detectives -- A.2.4. ZombiLingo -- A.3. NLP annotation platforms -- A.3.1. GATE -- A.3.2. EULIA -- A.3.3. UIMA -- A.3.4. SYNC3 -- A.4. Annotation management tools -- A.4.1. Slate -- A.4.2. Djangology -- A.4.3. GATE Teamware -- A.4.4. WebAnno -- A.5. (Many) Other tools -- Glossary -- Bibliography -- Index -- Other titles from ISTE in Cognitive Science and Knowledge Management -- ELUA |
Title | Collaborative annotation for reliable natural language processing : technical and sociological aspects |
URI | https://cir.nii.ac.jp/crid/1130282273284077312 https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID=4558125 https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781119307648&uid=none https://www.vlebooks.com/vleweb/product/openreader?id=none&isbn=9781119307655 https://hal.science/hal-01324322 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Pa9swFH6syWW9tFs7lv4YYuzqNrIl2zq2JSOMtqdu9GYkRWZhwx2Ll8P--n6y5ThJC_txEbYQMuiTv_f0pPeJ6IOUTmqrbWR46RcoIo8M7GyUpeNMW5UoXfo45M1tOv0sPt3L-_520Ca7pDZn9vezeSX_gyrqgKvPkv0HZFedogLPwBclEEa55fyuXoOmQI_e0guLVA9rhwb9KeMmIepWt6Ia1yEm2WUFdNaqvZyjDjlhzZb5ZbUeBuDbYYBnztkEkgmCyKtUqXblCIpT-L3TVubyCY-2uqyrdmmqtvSqGws4ubwSUsJFkDs0jEWaiAENYU4nN32YK4-VkOOgbopuzzc63aVdvfgGOgfV1wvY96_-OOpONZ8_sY2Nwb_bp6HzWSCv6IWrXtNed_cFC1R4QBcbELAeAgYIWAcBCxCwDgLWQ3BIXz5O7q6mUbiOItISfrKfxCUfz8pS5aUADYLLbC6d1doYnWFlyJUxSTYrNZykzDplZII_QcHlVrniWiVvaFA9VO4tMZHo3ObcCVMaMYs1qD43s1g4kSqV2XRE79fGpVh-b7bOF8UGcn_RSEo0wpgWP1rtksKriU8vrgtf57fZBAh9yUd0iiEv7NyX3O9lw2_0Mk5Y7mcJj0fEOjCK5iPh5HDRz4CjPzc5ppf9zD2hQf3zlzuFS1ebd2HaPAI13UZo |
linkProvider | ProQuest Ebooks |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=book&rft.title=Collaborative+Annotation+for+Reliable+Natural+Language+Processing&rft.au=Fort%2C+Kar%C3%ABn&rft.date=2016-01-01&rft.pub=John+Wiley+%26+Sons%2C+Incorporated&rft.isbn=9781119307648&rft_id=info:doi/10.1002%2F9781119306696&rft.externalDocID=EBC4558125 |
thumbnail_m | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97811193%2F9781119307648.jpg http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Fvle.dmmserver.com%2Fmedia%2F640%2F97811193%2F9781119307655.jpg |