Evaluating Generalizability of Fine-Tuned Models for Fake News Detection
The Covid-19 pandemic has caused a dramatic and parallel rise in dangerous misinformation, denoted an `infodemic' by the CDC and WHO. Misinformation tied to the Covid-19 infodemic changes continuously; this can lead to performance degradation of fine-tuned models due to concept drift. Degredati...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
23.05.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The Covid-19 pandemic has caused a dramatic and parallel rise in dangerous misinformation, denoted an `infodemic' by the CDC and WHO. Misinformation tied to the Covid-19 infodemic changes continuously; this can lead to performance degradation of fine-tuned models due to concept drift. Degredation can be mitigated if models generalize well-enough to capture some cyclical aspects of drifted data. In this paper, we explore generalizability of pre-trained and fine-tuned fake news detectors across 9 fake news datasets. We show that existing models often overfit on their training dataset and have poor performance on unseen data. However, on some subsets of unseen data that overlap with training data, models have higher accuracy. Based on this observation, we also present KMeans-Proxy, a fast and effective method based on K-Means clustering for quickly identifying these overlapping subsets of unseen data. KMeans-Proxy improves generalizability on unseen fake news datasets by 0.1-0.2 f1-points across datasets. We present both our generalizability experiments as well as KMeans-Proxy to further research in tackling the fake news problem. |
---|---|
AbstractList | The Covid-19 pandemic has caused a dramatic and parallel rise in dangerous misinformation, denoted an `infodemic' by the CDC and WHO. Misinformation tied to the Covid-19 infodemic changes continuously; this can lead to performance degradation of fine-tuned models due to concept drift. Degredation can be mitigated if models generalize well-enough to capture some cyclical aspects of drifted data. In this paper, we explore generalizability of pre-trained and fine-tuned fake news detectors across 9 fake news datasets. We show that existing models often overfit on their training dataset and have poor performance on unseen data. However, on some subsets of unseen data that overlap with training data, models have higher accuracy. Based on this observation, we also present KMeans-Proxy, a fast and effective method based on K-Means clustering for quickly identifying these overlapping subsets of unseen data. KMeans-Proxy improves generalizability on unseen fake news datasets by 0.1-0.2 f1-points across datasets. We present both our generalizability experiments as well as KMeans-Proxy to further research in tackling the fake news problem. |
Author | Suprem, Abhijit Calton Pu |
Author_xml | – sequence: 1 givenname: Abhijit surname: Suprem fullname: Suprem, Abhijit – sequence: 2 fullname: Calton Pu |
BookMark | eNqNyrEOgjAUQNHGaCIq__ASZxJsobgryKKTO6nyMMXmVWmr0a_XwQ9wusO5MzYmSzhiERdilawzzqcsdq5P05TLgue5iFhdPpQJymu6wA4JB2X0W5200f4FtoNKEybHQNjC3rZoHHR2gEpdEQ74dLBFj2evLS3YpFPGYfzrnC2r8ripk9tg7wGdb3obBvpSw6XMRZFlQor_rg949T1o |
ContentType | Paper |
Copyright | 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU COVID DWQXO HCIFZ L6V M7S PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College Coronavirus Research Database ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition Coronavirus Research Database ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest One Academic Engineering Collection |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics |
EISSN | 2331-8422 |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU COVID DWQXO FRJ HCIFZ L6V M7S M~E PIMPY PQEST PQQKQ PQUKI PRINS PTHSS |
ID | FETCH-proquest_journals_26653744363 |
IEDL.DBID | BENPR |
IngestDate | Thu Oct 10 17:25:39 EDT 2024 |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-proquest_journals_26653744363 |
OpenAccessLink | https://www.proquest.com/docview/2665374436?pq-origsite=%requestingapplication% |
PQID | 2665374436 |
PQPubID | 2050157 |
ParticipantIDs | proquest_journals_2665374436 |
PublicationCentury | 2000 |
PublicationDate | 20220523 |
PublicationDateYYYYMMDD | 2022-05-23 |
PublicationDate_xml | – month: 05 year: 2022 text: 20220523 day: 23 |
PublicationDecade | 2020 |
PublicationPlace | Ithaca |
PublicationPlace_xml | – name: Ithaca |
PublicationTitle | arXiv.org |
PublicationYear | 2022 |
Publisher | Cornell University Library, arXiv.org |
Publisher_xml | – name: Cornell University Library, arXiv.org |
SSID | ssj0002672553 |
Score | 3.3954508 |
SecondaryResourceType | preprint |
Snippet | The Covid-19 pandemic has caused a dramatic and parallel rise in dangerous misinformation, denoted an `infodemic' by the CDC and WHO. Misinformation tied to... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Cluster analysis Clustering Coronaviruses COVID-19 Datasets False information News Performance degradation Training Vector quantization |
Title | Evaluating Generalizability of Fine-Tuned Models for Fake News Detection |
URI | https://www.proquest.com/docview/2665374436 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3PS8MwFH64FsGbP_HHHAG9BrOka-1JcGutwuaQKbuNNE08OLq5dgcv_u2-lEwPwk6BBBISHt_73svHewDXQcgLxW3xwYLHGKAwRXMjNWWSKYEmEuum89xwFGavwdO0N3UJt8rJKjeY2AB1sVA2R36DjqQnoiAQ4d3yk9quUfZ31bXQaIHPMVJgHvj3yWj88ptl4WGEnFn8A9rGe6T74I_lUq8OYEeXh7DbiC5VdQRZ4kptl-_ElX-2GiurVv0iC0NSZIB0skYgJLZl2bwiyDBJKj80sdhEBrpuhFTlMVylyaSf0c3xM2ci1ezvQuIEPIz19SmQAqHHmC5GuKxAomJiiXTLsChSssvyXJ5Be9tO59uXL2CPW_U-61Eu2uDVq7W-RJ9a5x1o3aYPHfd8OPaf3x4HODv8Tn4AK2uDFQ |
link.rule.ids | 783,787,12777,21400,33385,33756,38528,43612,43817,43907 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV09T8MwED1BKwQbn-KjgCVYLRw7H2RigIYAbcUQpG6R49gMoLQ06cC_52y5MCB1tmTL1undu_PTPYDrMOa14nb4YM1TLFCYopWRmjLJlMAQSbVznhtP4vwtfJ5GU99wa72scoWJDqjrmbI98htMJJFIwlDEd_Mval2j7O-qt9DYhD4uBNbB4DZ7_O2x8DhBxiz-wazLHdku9F_lXC_2YEM3-7DlJJeqPYB86AdtN-_ED3-2CiurVf0mM0My5H-0WCIMEmtY9tkS5Jckkx-aWGQiD7pzMqrmEK6yYXGf09XxpQ-Qtvy7jjiCHlb6-hhIjcBjTID1LauRpphUItkyLEmUDFhVyRMYrNvpdP3yJWznxXhUjp4mL2eww62On0WUiwH0usVSn2N27aoL94Q_CsqA8Q |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Evaluating+Generalizability+of+Fine-Tuned+Models+for+Fake+News+Detection&rft.jtitle=arXiv.org&rft.au=Suprem%2C+Abhijit&rft.au=Calton+Pu&rft.date=2022-05-23&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422 |