HedgePeer: A Dataset for Uncertainty Detection in Peer Reviews

Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps extract factual information instead of uncertain or non-factual information. To avoid overprecise commitment, people use linguistic devices li...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries pp. 1 - 5
Main Authors Ghosal, Tirthankar, Varanasi, Kamal Kaushik, Kordoni, Valia
Format Conference Proceeding
LanguageEnglish
Published ACM 20.06.2022
Subjects
Online AccessGet full text
DOI10.1145/3529372.3533300

Cover

Abstract Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps extract factual information instead of uncertain or non-factual information. To avoid overprecise commitment, people use linguistic devices like hedges (uncertain words or phrases). In peer reviews, reviewers often use hedges wherever they are unsure about their opinion or when facts do not back their opinions. Usage of hedges or uncertain words in writing can also indicate the reviewer's confidence or measure of conviction in their reviews. Reviewer confidence is important in the peer review process (especially to the editors or chairs) to judge the quality of evaluation of the paper under review. However, the self-annotated reviewer confidence score is often miscalibrated or biased and not an accurate representation of the reviewer's conviction of their judgment on the merit of the paper. Less confident reviewers sometimes speculate their observations. Here in this paper, we introduce HedgePeer, a new uncertainty detection dataset of peer review comments, which is more than five times larger than the existing datasets on hedge detection in other domains. We curate our dataset from the open-access reviews available in the open review platform and annotate the review comments in terms of the hedge cues and hedge spans. We also provide several baseline approaches, including a multitask learning model with sentiment intensity and parts-of-speech as scaffold tasks to predict hedge cues and spans. We make our dataset and baseline codes available at https://github.com/Tirthankar-Ghosal/HedgePeer-Dataset. Our dataset is motivated towards computationally estimating the reviewer's conviction from their review texts. CCS CONCEPTS * Computing methodologies → Information extraction; * Information systems → Information extraction.
AbstractList Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps extract factual information instead of uncertain or non-factual information. To avoid overprecise commitment, people use linguistic devices like hedges (uncertain words or phrases). In peer reviews, reviewers often use hedges wherever they are unsure about their opinion or when facts do not back their opinions. Usage of hedges or uncertain words in writing can also indicate the reviewer's confidence or measure of conviction in their reviews. Reviewer confidence is important in the peer review process (especially to the editors or chairs) to judge the quality of evaluation of the paper under review. However, the self-annotated reviewer confidence score is often miscalibrated or biased and not an accurate representation of the reviewer's conviction of their judgment on the merit of the paper. Less confident reviewers sometimes speculate their observations. Here in this paper, we introduce HedgePeer, a new uncertainty detection dataset of peer review comments, which is more than five times larger than the existing datasets on hedge detection in other domains. We curate our dataset from the open-access reviews available in the open review platform and annotate the review comments in terms of the hedge cues and hedge spans. We also provide several baseline approaches, including a multitask learning model with sentiment intensity and parts-of-speech as scaffold tasks to predict hedge cues and spans. We make our dataset and baseline codes available at https://github.com/Tirthankar-Ghosal/HedgePeer-Dataset. Our dataset is motivated towards computationally estimating the reviewer's conviction from their review texts. CCS CONCEPTS * Computing methodologies → Information extraction; * Information systems → Information extraction.
Author Kordoni, Valia
Ghosal, Tirthankar
Varanasi, Kamal Kaushik
Author_xml – sequence: 1
  givenname: Tirthankar
  surname: Ghosal
  fullname: Ghosal, Tirthankar
  email: ghosal@ufal.mff.cuni.cz
  organization: Charles University, Faculty of Mathematics and Physics, Institute of Formal And Applied Linguistics,Prague,Czech Republic
– sequence: 2
  givenname: Kamal Kaushik
  surname: Varanasi
  fullname: Varanasi, Kamal Kaushik
  email: 1801ce31@iitp.ac.in
  organization: Indian Institute of Technology Patna,Department of Civil Engineering,Patna,Bihar,India
– sequence: 3
  givenname: Valia
  surname: Kordoni
  fullname: Kordoni, Valia
  email: kordonie@rz.hu-berlin.de
  organization: Humboldt-Universitaet zu Berlin,Department of English Studies,Berlin,Germany
BookMark eNotj8FKAzEURSMoaGvXLtzkB6YmeUkmcSGUVq1QaBG7LsnkjUQ0I0lQ-veO6OrCPZcDd0JO05CQkCvO5pxLdQNKWGjFHBQAMHZCJmPLwIJU8pzMSnljjAmjrW7tBblbY3jFHWK-pQu6ctUVrLQfMt2nDnN1MdUjXWHFrsYh0Zjo75g-41fE73JJznr3XnD2n1Oyf7h_Wa6bzfbxabnYNE7ItjbAOh16yzobFHBu0ASu0XtwznoUznvpW2sxmB475aXUCoRgfoTBgTEwJdd_3oiIh88cP1w-HqwZv2qAH-IER_w
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3529372.3533300
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1450393454
9781450393454
EndPage 5
ExternalDocumentID 9852963
Genre orig-research
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
GUFHI
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a247t-30c6df90c9d53118e8d16ebb3aa9be2abb4b799ed8fec5b44653220b9beda3883
IEDL.DBID RIE
IngestDate Wed Aug 27 02:23:33 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-30c6df90c9d53118e8d16ebb3aa9be2abb4b799ed8fec5b44653220b9beda3883
PageCount 5
ParticipantIDs ieee_primary_9852963
PublicationCentury 2000
PublicationDate 2022-June-20
PublicationDateYYYYMMDD 2022-06-20
PublicationDate_xml – month: 06
  year: 2022
  text: 2022-June-20
  day: 20
PublicationDecade 2020
PublicationTitle Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries
PublicationTitleAbbrev JCDL
PublicationYear 2022
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0002869679
Score 1.9272821
Snippet Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Hedges
Information retrieval
Libraries
Linguistics
Measurement uncertainty
Peer Reviews
Predictive models
Reviewer Confidence
Uncertainty
Uncertainty Detection
Writing
Title HedgePeer: A Dataset for Uncertainty Detection in Peer Reviews
URI https://ieeexplore.ieee.org/document/9852963
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA7bTp5UNvE3OXi0XdskbeJBEOcYgrKDg91GfryCCJ3M7KB_vS_pnCIevIU0NE3Sx_d4731fCLkwghvInUVDqkXCa54nSmCrrtFbdjVjwkS1z8dyMuP3czHvkMstFwYAYvEZpKEZc_luadchVDZUMmQJWZd08TdruVrbeEohS1VWaqPek3MxRNcCsbdIGXo0LDDYflyfEtFjvEsevuZti0Ze0rU3qf34Jcn43w_bI4Nvnh6dbhFon3Sg6ZPrSQiRTQFWV_SGjrRHnPIUfVM6w_GxAMC_0xH4WITV0OeGhsG0zRK8DchsfPd0O0k2tyQkuuCVT1hmcVNVZpVDe8olSJeXYAzTWhkotDHcVEqBkzVYYaKgWlFkBh86zaRkB6TXLBs4JNQaB7YAx5UpuQShJb65yvAEOdOVrI9IP6x98doKYSw2yz7-u_uE7BSBK5CVaIqnpOdXazhDBPfmPB7dJ-KynJg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zHvSksom_zcGj7domaRMPgqhj6jZ22GC30TSvIEInMzvoX-9LOqeIB28hDUmT8Pge773vCyEXWnANsSnQkEoR8JLHgRLYKkv0lk3JmNBe7XOY9ib8cSqmDXK55sIAgC8-g9A1fS7fzIulC5V1lHRZQrZBNhH3uajZWuuISiJTlWZqpd8Tc9FB5wLRNwkZ-jTMcdh-PKDi8aO7QwZfK9dlIy_h0uqw-PglyvjfX9sl7W-mHh2tMWiPNKBqkeueC5KNABZX9Ibe5RaRylL0TukEx_sSAPtO78D6MqyKPlfUDaZ1nuCtTSbd-_FtL1i9kxDkCc9swKICj1VFhTJoUbEEaeIUtGZ5rjQkudZcZ0qBkSUUQntJtSSJNH40OZOS7ZNmNa_ggNBCGygSMFzplEsQucSZswjvkLM8k-Uhabm9z15rKYzZattHf3efk63eeNCf9R-GT8dkO3HMgShFwzwhTbtYwiniudVn_ho_AUfZn-U
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+22nd+ACM%2FIEEE+Joint+Conference+on+Digital+Libraries&rft.atitle=HedgePeer%3A+A+Dataset+for+Uncertainty+Detection+in+Peer+Reviews&rft.au=Ghosal%2C+Tirthankar&rft.au=Varanasi%2C+Kamal+Kaushik&rft.au=Kordoni%2C+Valia&rft.date=2022-06-20&rft.pub=ACM&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1145%2F3529372.3533300&rft.externalDocID=9852963