HedgePeer: A Dataset for Uncertainty Detection in Peer Reviews
Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps extract factual information instead of uncertain or non-factual information. To avoid overprecise commitment, people use linguistic devices li...
Saved in:
Published in | Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries pp. 1 - 5 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
ACM
20.06.2022
|
Subjects | |
Online Access | Get full text |
DOI | 10.1145/3529372.3533300 |
Cover
Abstract | Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps extract factual information instead of uncertain or non-factual information. To avoid overprecise commitment, people use linguistic devices like hedges (uncertain words or phrases). In peer reviews, reviewers often use hedges wherever they are unsure about their opinion or when facts do not back their opinions. Usage of hedges or uncertain words in writing can also indicate the reviewer's confidence or measure of conviction in their reviews. Reviewer confidence is important in the peer review process (especially to the editors or chairs) to judge the quality of evaluation of the paper under review. However, the self-annotated reviewer confidence score is often miscalibrated or biased and not an accurate representation of the reviewer's conviction of their judgment on the merit of the paper. Less confident reviewers sometimes speculate their observations. Here in this paper, we introduce HedgePeer, a new uncertainty detection dataset of peer review comments, which is more than five times larger than the existing datasets on hedge detection in other domains. We curate our dataset from the open-access reviews available in the open review platform and annotate the review comments in terms of the hedge cues and hedge spans. We also provide several baseline approaches, including a multitask learning model with sentiment intensity and parts-of-speech as scaffold tasks to predict hedge cues and spans. We make our dataset and baseline codes available at https://github.com/Tirthankar-Ghosal/HedgePeer-Dataset. Our dataset is motivated towards computationally estimating the reviewer's conviction from their review texts. CCS CONCEPTS * Computing methodologies → Information extraction; * Information systems → Information extraction. |
---|---|
AbstractList | Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps extract factual information instead of uncertain or non-factual information. To avoid overprecise commitment, people use linguistic devices like hedges (uncertain words or phrases). In peer reviews, reviewers often use hedges wherever they are unsure about their opinion or when facts do not back their opinions. Usage of hedges or uncertain words in writing can also indicate the reviewer's confidence or measure of conviction in their reviews. Reviewer confidence is important in the peer review process (especially to the editors or chairs) to judge the quality of evaluation of the paper under review. However, the self-annotated reviewer confidence score is often miscalibrated or biased and not an accurate representation of the reviewer's conviction of their judgment on the merit of the paper. Less confident reviewers sometimes speculate their observations. Here in this paper, we introduce HedgePeer, a new uncertainty detection dataset of peer review comments, which is more than five times larger than the existing datasets on hedge detection in other domains. We curate our dataset from the open-access reviews available in the open review platform and annotate the review comments in terms of the hedge cues and hedge spans. We also provide several baseline approaches, including a multitask learning model with sentiment intensity and parts-of-speech as scaffold tasks to predict hedge cues and spans. We make our dataset and baseline codes available at https://github.com/Tirthankar-Ghosal/HedgePeer-Dataset. Our dataset is motivated towards computationally estimating the reviewer's conviction from their review texts. CCS CONCEPTS * Computing methodologies → Information extraction; * Information systems → Information extraction. |
Author | Kordoni, Valia Ghosal, Tirthankar Varanasi, Kamal Kaushik |
Author_xml | – sequence: 1 givenname: Tirthankar surname: Ghosal fullname: Ghosal, Tirthankar email: ghosal@ufal.mff.cuni.cz organization: Charles University, Faculty of Mathematics and Physics, Institute of Formal And Applied Linguistics,Prague,Czech Republic – sequence: 2 givenname: Kamal Kaushik surname: Varanasi fullname: Varanasi, Kamal Kaushik email: 1801ce31@iitp.ac.in organization: Indian Institute of Technology Patna,Department of Civil Engineering,Patna,Bihar,India – sequence: 3 givenname: Valia surname: Kordoni fullname: Kordoni, Valia email: kordonie@rz.hu-berlin.de organization: Humboldt-Universitaet zu Berlin,Department of English Studies,Berlin,Germany |
BookMark | eNotj8FKAzEURSMoaGvXLtzkB6YmeUkmcSGUVq1QaBG7LsnkjUQ0I0lQ-veO6OrCPZcDd0JO05CQkCvO5pxLdQNKWGjFHBQAMHZCJmPLwIJU8pzMSnljjAmjrW7tBblbY3jFHWK-pQu6ctUVrLQfMt2nDnN1MdUjXWHFrsYh0Zjo75g-41fE73JJznr3XnD2n1Oyf7h_Wa6bzfbxabnYNE7ItjbAOh16yzobFHBu0ASu0XtwznoUznvpW2sxmB475aXUCoRgfoTBgTEwJdd_3oiIh88cP1w-HqwZv2qAH-IER_w |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1145/3529372.3533300 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 1450393454 9781450393454 |
EndPage | 5 |
ExternalDocumentID | 9852963 |
Genre | orig-research |
GroupedDBID | 6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK GUFHI LHSKQ RIE RIL |
ID | FETCH-LOGICAL-a247t-30c6df90c9d53118e8d16ebb3aa9be2abb4b799ed8fec5b44653220b9beda3883 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:23:33 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a247t-30c6df90c9d53118e8d16ebb3aa9be2abb4b799ed8fec5b44653220b9beda3883 |
PageCount | 5 |
ParticipantIDs | ieee_primary_9852963 |
PublicationCentury | 2000 |
PublicationDate | 2022-June-20 |
PublicationDateYYYYMMDD | 2022-06-20 |
PublicationDate_xml | – month: 06 year: 2022 text: 2022-June-20 day: 20 |
PublicationDecade | 2020 |
PublicationTitle | Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries |
PublicationTitleAbbrev | JCDL |
PublicationYear | 2022 |
Publisher | ACM |
Publisher_xml | – name: ACM |
SSID | ssj0002869679 |
Score | 1.9272821 |
Snippet | Prague, Czech Republic Uncertainty detection from text is essential in many applications in information retrieval (IR). Detecting textual uncertainties helps... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | Hedges Information retrieval Libraries Linguistics Measurement uncertainty Peer Reviews Predictive models Reviewer Confidence Uncertainty Uncertainty Detection Writing |
Title | HedgePeer: A Dataset for Uncertainty Detection in Peer Reviews |
URI | https://ieeexplore.ieee.org/document/9852963 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA7bTp5UNvE3OXi0XdskbeJBEOcYgrKDg91GfryCCJ3M7KB_vS_pnCIevIU0NE3Sx_d4731fCLkwghvInUVDqkXCa54nSmCrrtFbdjVjwkS1z8dyMuP3czHvkMstFwYAYvEZpKEZc_luadchVDZUMmQJWZd08TdruVrbeEohS1VWaqPek3MxRNcCsbdIGXo0LDDYflyfEtFjvEsevuZti0Ze0rU3qf34Jcn43w_bI4Nvnh6dbhFon3Sg6ZPrSQiRTQFWV_SGjrRHnPIUfVM6w_GxAMC_0xH4WITV0OeGhsG0zRK8DchsfPd0O0k2tyQkuuCVT1hmcVNVZpVDe8olSJeXYAzTWhkotDHcVEqBkzVYYaKgWlFkBh86zaRkB6TXLBs4JNQaB7YAx5UpuQShJb65yvAEOdOVrI9IP6x98doKYSw2yz7-u_uE7BSBK5CVaIqnpOdXazhDBPfmPB7dJ-KynJg |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zHvSksom_zcGj7domaRMPgqhj6jZ22GC30TSvIEInMzvoX-9LOqeIB28hDUmT8Pge773vCyEXWnANsSnQkEoR8JLHgRLYKkv0lk3JmNBe7XOY9ib8cSqmDXK55sIAgC8-g9A1fS7fzIulC5V1lHRZQrZBNhH3uajZWuuISiJTlWZqpd8Tc9FB5wLRNwkZ-jTMcdh-PKDi8aO7QwZfK9dlIy_h0uqw-PglyvjfX9sl7W-mHh2tMWiPNKBqkeueC5KNABZX9Ibe5RaRylL0TukEx_sSAPtO78D6MqyKPlfUDaZ1nuCtTSbd-_FtL1i9kxDkCc9swKICj1VFhTJoUbEEaeIUtGZ5rjQkudZcZ0qBkSUUQntJtSSJNH40OZOS7ZNmNa_ggNBCGygSMFzplEsQucSZswjvkLM8k-Uhabm9z15rKYzZattHf3efk63eeNCf9R-GT8dkO3HMgShFwzwhTbtYwiniudVn_ho_AUfZn-U |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+22nd+ACM%2FIEEE+Joint+Conference+on+Digital+Libraries&rft.atitle=HedgePeer%3A+A+Dataset+for+Uncertainty+Detection+in+Peer+Reviews&rft.au=Ghosal%2C+Tirthankar&rft.au=Varanasi%2C+Kamal+Kaushik&rft.au=Kordoni%2C+Valia&rft.date=2022-06-20&rft.pub=ACM&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1145%2F3529372.3533300&rft.externalDocID=9852963 |