HedgePeer: A Dataset for Uncertainty Detection in Peer Reviews

Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps extract factual information instead of uncertain or non-factual information. To avoid overprecise commitment, people use linguistic devices li...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries pp. 1 - 5
Main Authors	Ghosal, Tirthankar, Varanasi, Kamal Kaushik, Kordoni, Valia
Format	Conference Proceeding
Language	English
Published	ACM 20.06.2022
Subjects	Hedges Information retrieval Libraries Linguistics Measurement uncertainty Peer Reviews Predictive models Reviewer Confidence Uncertainty Uncertainty Detection Writing
Online Access	Get full text
DOI	10.1145/3529372.3533300

Cover

Abstract	Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps extract factual information instead of uncertain or non-factual information. To avoid overprecise commitment, people use linguistic devices like hedges (uncertain words or phrases). In peer reviews, reviewers often use hedges wherever they are unsure about their opinion or when facts do not back their opinions. Usage of hedges or uncertain words in writing can also indicate the reviewer's confidence or measure of conviction in their reviews. Reviewer confidence is important in the peer review process (especially to the editors or chairs) to judge the quality of evaluation of the paper under review. However, the self-annotated reviewer confidence score is often miscalibrated or biased and not an accurate representation of the reviewer's conviction of their judgment on the merit of the paper. Less confident reviewers sometimes speculate their observations. Here in this paper, we introduce HedgePeer, a new uncertainty detection dataset of peer review comments, which is more than five times larger than the existing datasets on hedge detection in other domains. We curate our dataset from the open-access reviews available in the open review platform and annotate the review comments in terms of the hedge cues and hedge spans. We also provide several baseline approaches, including a multitask learning model with sentiment intensity and parts-of-speech as scaffold tasks to predict hedge cues and spans. We make our dataset and baseline codes available at https://github.com/Tirthankar-Ghosal/HedgePeer-Dataset. Our dataset is motivated towards computationally estimating the reviewer's conviction from their review texts. CCS CONCEPTS * Computing methodologies → Information extraction; * Information systems → Information extraction.
AbstractList	Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps extract factual information instead of uncertain or non-factual information. To avoid overprecise commitment, people use linguistic devices like hedges (uncertain words or phrases). In peer reviews, reviewers often use hedges wherever they are unsure about their opinion or when facts do not back their opinions. Usage of hedges or uncertain words in writing can also indicate the reviewer's confidence or measure of conviction in their reviews. Reviewer confidence is important in the peer review process (especially to the editors or chairs) to judge the quality of evaluation of the paper under review. However, the self-annotated reviewer confidence score is often miscalibrated or biased and not an accurate representation of the reviewer's conviction of their judgment on the merit of the paper. Less confident reviewers sometimes speculate their observations. Here in this paper, we introduce HedgePeer, a new uncertainty detection dataset of peer review comments, which is more than five times larger than the existing datasets on hedge detection in other domains. We curate our dataset from the open-access reviews available in the open review platform and annotate the review comments in terms of the hedge cues and hedge spans. We also provide several baseline approaches, including a multitask learning model with sentiment intensity and parts-of-speech as scaffold tasks to predict hedge cues and spans. We make our dataset and baseline codes available at https://github.com/Tirthankar-Ghosal/HedgePeer-Dataset. Our dataset is motivated towards computationally estimating the reviewer's conviction from their review texts. CCS CONCEPTS * Computing methodologies → Information extraction; * Information systems → Information extraction.
Author	Kordoni, Valia Ghosal, Tirthankar Varanasi, Kamal Kaushik
Author_xml	– sequence: 1 givenname: Tirthankar surname: Ghosal fullname: Ghosal, Tirthankar email: ghosal@ufal.mff.cuni.cz organization: Charles University, Faculty of Mathematics and Physics, Institute of Formal And Applied Linguistics,Prague,Czech Republic – sequence: 2 givenname: Kamal Kaushik surname: Varanasi fullname: Varanasi, Kamal Kaushik email: 1801ce31@iitp.ac.in organization: Indian Institute of Technology Patna,Department of Civil Engineering,Patna,Bihar,India – sequence: 3 givenname: Valia surname: Kordoni fullname: Kordoni, Valia email: kordonie@rz.hu-berlin.de organization: Humboldt-Universitaet zu Berlin,Department of English Studies,Berlin,Germany
BookMark	eNotj8FKAzEURSMoaGvXLtzkB6YmeUkmcSGUVq1QaBG7LsnkjUQ0I0lQ-veO6OrCPZcDd0JO05CQkCvO5pxLdQNKWGjFHBQAMHZCJmPLwIJU8pzMSnljjAmjrW7tBblbY3jFHWK-pQu6ctUVrLQfMt2nDnN1MdUjXWHFrsYh0Zjo75g-41fE73JJznr3XnD2n1Oyf7h_Wa6bzfbxabnYNE7ItjbAOh16yzobFHBu0ASu0XtwznoUznvpW2sxmB475aXUCoRgfoTBgTEwJdd_3oiIh88cP1w-HqwZv2qAH-IER_w
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1145/3529372.3533300
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1450393454 9781450393454
EndPage	5
ExternalDocumentID	9852963
Genre	orig-research
GroupedDBID	6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIL
ID	FETCH-LOGICAL-a247t-30c6df90c9d53118e8d16ebb3aa9be2abb4b799ed8fec5b44653220b9beda3883
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:23:33 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a247t-30c6df90c9d53118e8d16ebb3aa9be2abb4b799ed8fec5b44653220b9beda3883
PageCount	5
ParticipantIDs	ieee_primary_9852963
PublicationCentury	2000
PublicationDate	2022-June-20
PublicationDateYYYYMMDD	2022-06-20
PublicationDate_xml	– month: 06 year: 2022 text: 2022-June-20 day: 20
PublicationDecade	2020
PublicationTitle	Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries
PublicationTitleAbbrev	JCDL
PublicationYear	2022
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssj0002869679
Score	1.9272821
Snippet	Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Hedges Information retrieval Libraries Linguistics Measurement uncertainty Peer Reviews Predictive models Reviewer Confidence Uncertainty Uncertainty Detection Writing
Title	HedgePeer: A Dataset for Uncertainty Detection in Peer Reviews
URI	https://ieeexplore.ieee.org/document/9852963
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA7bTp5UNvE3OXi0XdskbeJBEOcYgrKDg91GfryCCJ3M7KB_vS_pnCIevIU0NE3Sx_d4731fCLkwghvInUVDqkXCa54nSmCrrtFbdjVjwkS1z8dyMuP3czHvkMstFwYAYvEZpKEZc_luadchVDZUMmQJWZd08TdruVrbeEohS1VWaqPek3MxRNcCsbdIGXo0LDDYflyfEtFjvEsevuZti0Ze0rU3qf34Jcn43w_bI4Nvnh6dbhFon3Sg6ZPrSQiRTQFWV_SGjrRHnPIUfVM6w_GxAMC_0xH4WITV0OeGhsG0zRK8DchsfPd0O0k2tyQkuuCVT1hmcVNVZpVDe8olSJeXYAzTWhkotDHcVEqBkzVYYaKgWlFkBh86zaRkB6TXLBs4JNQaB7YAx5UpuQShJb65yvAEOdOVrI9IP6x98doKYSw2yz7-u_uE7BSBK5CVaIqnpOdXazhDBPfmPB7dJ-KynJg
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zHvSksom_zcGj7domaRMPgqhj6jZ22GC30TSvIEInMzvoX-9LOqeIB28hDUmT8Pge773vCyEXWnANsSnQkEoR8JLHgRLYKkv0lk3JmNBe7XOY9ib8cSqmDXK55sIAgC8-g9A1fS7fzIulC5V1lHRZQrZBNhH3uajZWuuISiJTlWZqpd8Tc9FB5wLRNwkZ-jTMcdh-PKDi8aO7QwZfK9dlIy_h0uqw-PglyvjfX9sl7W-mHh2tMWiPNKBqkeueC5KNABZX9Ibe5RaRylL0TukEx_sSAPtO78D6MqyKPlfUDaZ1nuCtTSbd-_FtL1i9kxDkCc9swKICj1VFhTJoUbEEaeIUtGZ5rjQkudZcZ0qBkSUUQntJtSSJNH40OZOS7ZNmNa_ggNBCGygSMFzplEsQucSZswjvkLM8k-Uhabm9z15rKYzZattHf3efk63eeNCf9R-GT8dkO3HMgShFwzwhTbtYwiniudVn_ho_AUfZn-U
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+22nd+ACM%2FIEEE+Joint+Conference+on+Digital+Libraries&rft.atitle=HedgePeer%3A+A+Dataset+for+Uncertainty+Detection+in+Peer+Reviews&rft.au=Ghosal%2C+Tirthankar&rft.au=Varanasi%2C+Kamal+Kaushik&rft.au=Kordoni%2C+Valia&rft.date=2022-06-20&rft.pub=ACM&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1145%2F3529372.3533300&rft.externalDocID=9852963