Extracting circumstances of Covid-19 transmission from free text with large language models
Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a na...
Saved in:
Published in | Nature communications Vol. 16; no. 1; pp. 5836 - 13 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
London
Nature Publishing Group UK
01.07.2025
Nature Publishing Group Nature Portfolio |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.
Open-ended survey questions may provide useful detail on possible venues of transmission of infectious diseases, but data are difficult to analyse at scale. Here, the authors use large language models to extract potential transmission venues in ~80,000 responses to an open-ended COVID-19 survey question in France. |
---|---|
AbstractList | Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.Open-ended survey questions may provide useful detail on possible venues of transmission of infectious diseases, but data are difficult to analyse at scale. Here, the authors use large language models to extract potential transmission venues in ~80,000 responses to an open-ended COVID-19 survey question in France. Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics. Open-ended survey questions may provide useful detail on possible venues of transmission of infectious diseases, but data are difficult to analyse at scale. Here, the authors use large language models to extract potential transmission venues in ~80,000 responses to an open-ended COVID-19 survey question in France. Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics. Abstract Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics. Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics. Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics. |
ArticleNumber | 5836 |
Author | Fontanet, Arnaud Bizel-Bizellot, Gaston Galmiche, Simon Charmet, Tiffany Zimmer, Christophe Coudeville, Laurent Lelandais, Benoît |
Author_xml | – sequence: 1 givenname: Gaston surname: Bizel-Bizellot fullname: Bizel-Bizellot, Gaston organization: Institut Pasteur, Université Paris Cité, Imaging and Modeling Unit – sequence: 2 givenname: Simon surname: Galmiche fullname: Galmiche, Simon organization: Institut Pasteur, Université Paris Cité, Epidemiology of Emerging Diseases Unit, McGill University, Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital – sequence: 3 givenname: Benoît orcidid: 0000-0003-0321-4015 surname: Lelandais fullname: Lelandais, Benoît organization: Institut Pasteur, Université Paris Cité, Imaging and Modeling Unit, Institut Pasteur, Université Paris Cité, Image Analysis Hub – sequence: 4 givenname: Tiffany surname: Charmet fullname: Charmet, Tiffany organization: Institut Pasteur, Université Paris Cité, Epidemiology of Emerging Diseases Unit – sequence: 5 givenname: Laurent orcidid: 0000-0003-3651-2959 surname: Coudeville fullname: Coudeville, Laurent organization: Sanofi Vaccines, Global Medical – sequence: 6 givenname: Arnaud surname: Fontanet fullname: Fontanet, Arnaud email: arnaud.fontanet@pasteur.fr organization: Institut Pasteur, Université Paris Cité, Epidemiology of Emerging Diseases Unit, Conservatoire national des arts et métiers, PACRI Unit – sequence: 7 givenname: Christophe orcidid: 0000-0001-9910-1589 surname: Zimmer fullname: Zimmer, Christophe email: czimmer@pasteur.fr organization: Institut Pasteur, Université Paris Cité, Imaging and Modeling Unit, University of Würzburg, Rudolf Virchow Center for integrative and translational bioimaging, University of Würzburg, Center for Artificial Intelligence and Data Science (CAIDAS) |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/40593634$$D View this record in MEDLINE/PubMed https://pasteur.hal.science/pasteur-05172560$$DView record in HAL |
BookMark | eNp9kk1v1DAQhiNUREvpH-CAInHhEvC3kxOqVoVWWokLnDhYjjPJZpXYi53stv-eYVNK2wM-eEb2M-_YM_M6O_HBQ5a9peQjJbz8lAQVSheEyUIRrVhxeJGdMSJoQTXjJ4_80-wipS3BxStaCvEqOxVEVlxxcZb9vLqdonVT77vc9dHNY5qsd5Dy0OarsO-bglY5Ij6NfUp98Hkbw4gbQD7B7ZQf-mmTDzZ2gLvvZovOGBoY0pvsZWuHBBf39jz78eXq--q6WH_7erO6XBdOcjYVHCRAWzteagCty5JK0jac11zbmkElFThNRWlVKZlwQttSOCZqyZQmZSP4eXaz6DbBbs0u9qONdybY3hwPQuyMjVPvBjDCNlZr2rqyJqJuSA1Vg4Y6UdmWlxK1Pi9au7keoXHg8e_DE9GnN77fmC7sDWWMVkpVqFAsCptncdeXa7OzaYI5GiKxM1KRPUX-w33GGH7NkCaDhXYwYDEhzMlwxhSXHAuA6Ptn6DbM0WNtjxTD7JQh9e7xFx7e8LfpCLAFcDGkFKF9QCgxf4bLLMNlcLjMcbjMAYP4EpQQ9h3Ef7n_E_UbC1fRng |
Cites_doi | 10.1146/annurev-biodatasci-021821-061045 10.1016/j.lanepe.2021.100148 10.2196/19421 10.1016/j.idnow.2024.104943 10.14618/ids-pub-9021 10.48550/arXiv.1907.11692 10.1007/978-3-030-57811-4_16 10.1001/jamanetworkopen.2023.22299 10.48550/ARXIV.1810.04805 10.1097/QAI.0000000000001580 10.48550/arXiv.2303.13375 10.7554/eLife.58227 10.48550/arXiv.1802.03426 10.5281/zenodo.15683658 10.3390/ijerph17249467 10.3389/fpubh.2023.1268223 10.1016/j.lanepe.2021.100171 10.3390/healthcare10112270 10.1093/cid/ciaa1442 10.1079/PNS2004399 10.48550/arXiv.2203.05794 10.1145/1143844.1143967 10.1145/2939672.2939778 10.48550/arXiv.2302.13971 10.1371/journal.pone.0244477 10.18653/v1/D19-1410 10.1197/jamia.M2922 10.1016/j.lanepe.2021.100278 10.1093/poq/nfp031 10.2196/37771 10.1177/1073191120957102 10.48550/ARXIV.1911.03894 10.1145/2733381 10.1201/9781420059458.ch4 |
ContentType | Journal Article |
Copyright | The Author(s) 2025 2025. The Author(s). Copyright Nature Publishing Group 2025 Attribution - NonCommercial - NoDerivatives The Author(s) 2025 2025 |
Copyright_xml | – notice: The Author(s) 2025 – notice: 2025. The Author(s). – notice: Copyright Nature Publishing Group 2025 – notice: Attribution - NonCommercial - NoDerivatives – notice: The Author(s) 2025 2025 |
DBID | C6C AAYXX CITATION CGR CUY CVF ECM EIF NPM 3V. 7QL 7QP 7QR 7SN 7SS 7ST 7T5 7T7 7TM 7TO 7X7 7XB 88E 8AO 8FD 8FE 8FG 8FH 8FI 8FJ 8FK ABUWG AEUYN AFKRA ARAPS AZQEC BBNVY BENPR BGLVJ BHPHI C1K CCPQU COVID DWQXO FR3 FYUFA GHDGH GNUQQ H94 HCIFZ K9. LK8 M0S M1P M7P P5Z P62 P64 PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQGLB PQQKQ PQUKI PRINS RC3 SOI 7X8 1XC VOOES 5PM DOA |
DOI | 10.1038/s41467-025-60762-w |
DatabaseName | Springer Nature OA Free Journals CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed ProQuest Central (Corporate) Bacteriology Abstracts (Microbiology B) Calcium & Calcified Tissue Abstracts Chemoreception Abstracts Ecology Abstracts Entomology Abstracts (Full archive) Environment Abstracts Immunology Abstracts Industrial and Applied Microbiology Abstracts (Microbiology A) Nucleic Acids Abstracts Oncogenes and Growth Factors Abstracts Health & Medical Collection ProQuest Central (purchase pre-March 2016) Medical Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Natural Science Collection Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest One Sustainability ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials Biological Science Collection ProQuest Central ProQuest Technology Collection Natural Science Collection Environmental Sciences and Pollution Management ProQuest One Coronavirus Research Database ProQuest Central Korea Engineering Research Database Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Central Student AIDS and Cancer Research Abstracts SciTech Premium Collection ProQuest Health & Medical Complete (Alumni) ProQuest Biological Science Collection Health & Medical Collection (Alumni Edition) Medical Database Biological Science Database ProQuest advanced technologies & aerospace journals ProQuest Advanced Technologies & Aerospace Collection Biotechnology and BioEngineering Abstracts ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Genetics Abstracts Environment Abstracts MEDLINE - Academic Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database ProQuest Central Student Oncogenes and Growth Factors Abstracts ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials Nucleic Acids Abstracts SciTech Premium Collection ProQuest Central China Environmental Sciences and Pollution Management ProQuest One Applied & Life Sciences ProQuest One Sustainability Health Research Premium Collection Natural Science Collection Health & Medical Research Collection Biological Science Collection Chemoreception Abstracts Industrial and Applied Microbiology Abstracts (Microbiology A) ProQuest Central (New) ProQuest Medical Library (Alumni) Advanced Technologies & Aerospace Collection ProQuest Biological Science Collection ProQuest One Academic Eastern Edition Coronavirus Research Database ProQuest Hospital Collection ProQuest Technology Collection Health Research Premium Collection (Alumni) Biological Science Database Ecology Abstracts ProQuest Hospital Collection (Alumni) Biotechnology and BioEngineering Abstracts Entomology Abstracts ProQuest Health & Medical Complete ProQuest One Academic UKI Edition Engineering Research Database ProQuest One Academic Calcium & Calcified Tissue Abstracts ProQuest One Academic (New) Technology Collection Technology Research Database ProQuest One Academic Middle East (New) ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Natural Science Collection ProQuest Pharma Collection ProQuest Central ProQuest Health & Medical Research Collection Genetics Abstracts Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Bacteriology Abstracts (Microbiology B) AIDS and Cancer Research Abstracts ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest Medical Library Immunology Abstracts Environment Abstracts ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | Publicly Available Content Database MEDLINE - Academic MEDLINE |
Database_xml | – sequence: 1 dbid: C6C name: Springer Nature OA Free Journals url: http://www.springeropen.com/ sourceTypes: Publisher – sequence: 2 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 3 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 4 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 5 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology Public Health Computer Science |
EISSN | 2041-1723 |
EndPage | 13 |
ExternalDocumentID | oai_doaj_org_article_4ada771fc8b04bd0be9dbd01c49af385 PMC12219669 oai_HAL_pasteur_05172560v1 40593634 10_1038_s41467_025_60762_w |
Genre | Journal Article |
GeographicLocations | France |
GeographicLocations_xml | – name: France |
GrantInformation_xml | – fundername: Institut Pasteur funderid: https://doi.org/10.13039/501100003762 |
GroupedDBID | --- 0R~ 39C 53G 5VS 70F 7X7 88E 8AO 8FE 8FG 8FH 8FI 8FJ AAHBH AAJSJ AASML ABUWG ACGFO ACGFS ACIWK ACMJI ACPRK ADBBV ADFRT ADMLS ADRAZ AENEX AEUYN AFKRA AFRAH AHMBA ALIPV ALMA_UNASSIGNED_HOLDINGS AMTXH AOIJS ARAPS ASPBG AVWKF AZFZN BBNVY BCNDV BENPR BGLVJ BHPHI BPHCQ BVXVI C6C CCPQU DIK EBLON EBS EE. EMOBN F5P FEDTE FYUFA GROUPED_DOAJ HCIFZ HMCUK HVGLF HYE HZ~ KQ8 LGEZI LK8 LOTEE M1P M7P M~E NADUK NAO NXXTH O9- OK1 P2P P62 PHGZM PHGZT PIMPY PQQKQ PROAC PSQYO RNS RNT RNTTT RPM SNYQT SV3 TSG UKHRP AAYXX CITATION PPXIY PQGLB CGR CUY CVF ECM EIF NPM 3V. 7QL 7QP 7QR 7SN 7SS 7ST 7T5 7T7 7TM 7TO 7XB 8FD 8FK AZQEC C1K COVID DWQXO FR3 GNUQQ H94 K9. M48 P64 PJZUB PKEHL PQEST PQUKI PRINS RC3 SOI 7X8 PUEGO 1XC AARCD VOOES 5PM |
ID | FETCH-LOGICAL-c532t-3e5eefbc387ee7788150fd33b37ab2e956ec7148a68524c47a84c24b526708d43 |
IEDL.DBID | 7X7 |
ISSN | 2041-1723 |
IngestDate | Wed Aug 27 01:31:07 EDT 2025 Thu Aug 21 18:33:25 EDT 2025 Fri Aug 08 06:20:36 EDT 2025 Tue Aug 26 08:59:30 EDT 2025 Sat Aug 23 12:58:58 EDT 2025 Mon Jul 07 01:54:48 EDT 2025 Thu Jul 10 08:36:03 EDT 2025 Wed Jul 02 02:44:44 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | 2025. The Author(s). Attribution - NonCommercial - NoDerivatives: http://creativecommons.org/licenses/by-nc-nd Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c532t-3e5eefbc387ee7788150fd33b37ab2e956ec7148a68524c47a84c24b526708d43 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 PMCID: PMC12219669 |
ORCID | 0000-0001-9910-1589 0000-0003-3651-2959 0000-0003-0321-4015 0000-0002-1691-1744 |
OpenAccessLink | https://www.proquest.com/docview/3226269312?pq-origsite=%requestingapplication% |
PMID | 40593634 |
PQID | 3226269312 |
PQPubID | 546298 |
PageCount | 13 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_4ada771fc8b04bd0be9dbd01c49af385 pubmedcentral_primary_oai_pubmedcentral_nih_gov_12219669 hal_primary_oai_HAL_pasteur_05172560v1 proquest_miscellaneous_3226353148 proquest_journals_3226269312 pubmed_primary_40593634 crossref_primary_10_1038_s41467_025_60762_w springer_journals_10_1038_s41467_025_60762_w |
PublicationCentury | 2000 |
PublicationDate | 2025-07-01 |
PublicationDateYYYYMMDD | 2025-07-01 |
PublicationDate_xml | – month: 07 year: 2025 text: 2025-07-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | London |
PublicationPlace_xml | – name: London – name: England |
PublicationTitle | Nature communications |
PublicationTitleAbbrev | Nat Commun |
PublicationTitleAlternate | Nat Commun |
PublicationYear | 2025 |
Publisher | Nature Publishing Group UK Nature Publishing Group Nature Portfolio |
Publisher_xml | – name: Nature Publishing Group UK – name: Nature Publishing Group – name: Nature Portfolio |
References | R Grant (60762_CR25) 2022; 13 MA Al-Garadi (60762_CR20) 2022; 10 60762_CR40 Q Chen (60762_CR18) 2021; 4 C Shen (60762_CR19) 2020; 22 60762_CR27 60762_CR28 60762_CR29 60762_CR6 C Perrey (60762_CR26) 2024; 54 KL Schwartz (60762_CR21) 2020; 15 60762_CR8 S Galmiche (60762_CR2) 2021; 7 60762_CR7 M Galesic (60762_CR3) 2009; 73 60762_CR9 DJ Feller (60762_CR22) 2018; 77 L Towler (60762_CR16) 2023; 11 PM Heider (60762_CR23) 2022; 290 G Eisele (60762_CR4) 2022; 29 RJGB Campello (60762_CR35) 2015; 10 M Kim (60762_CR13) 2020; 17 60762_CR30 60762_CR31 60762_CR10 60762_CR32 60762_CR33 60762_CR34 60762_CR36 60762_CR38 60762_CR39 JV Olmen (60762_CR11) 2022; 10 P Bondaronek (60762_CR15) 2023; 6 G Hripcsak (60762_CR12) 2009; 16 T Wagner (60762_CR17) 2020; 9 B Min (60762_CR5) 2023; 56 M Cevik (60762_CR1) 2021; 73 F Rabiee (60762_CR37) 2004; 63 K Mermin-Bunnell (60762_CR14) 2023; 6 T Charmet (60762_CR24) 2021; 8 |
References_xml | – volume: 4 start-page: 313 year: 2021 ident: 60762_CR18 publication-title: Annu. Rev. Biomed. Data Sci. doi: 10.1146/annurev-biodatasci-021821-061045 – volume: 7 start-page: 100148 year: 2021 ident: 60762_CR2 publication-title: Lancet Reg. Health Eur. doi: 10.1016/j.lanepe.2021.100148 – volume: 22 year: 2020 ident: 60762_CR19 publication-title: J. Med. Internet Res. doi: 10.2196/19421 – volume: 54 year: 2024 ident: 60762_CR26 publication-title: Infect. Dis. Now. doi: 10.1016/j.idnow.2024.104943 – ident: 60762_CR29 doi: 10.14618/ids-pub-9021 – ident: 60762_CR8 doi: 10.48550/arXiv.1907.11692 – ident: 60762_CR10 doi: 10.1007/978-3-030-57811-4_16 – volume: 6 year: 2023 ident: 60762_CR14 publication-title: JAMA Netw. Open doi: 10.1001/jamanetworkopen.2023.22299 – ident: 60762_CR6 doi: 10.48550/ARXIV.1810.04805 – volume: 77 start-page: 160 year: 2018 ident: 60762_CR22 publication-title: J. Acquir. Immune Defic. Syndr. 1999 doi: 10.1097/QAI.0000000000001580 – ident: 60762_CR39 doi: 10.48550/arXiv.2303.13375 – volume: 56 start-page: 30:1 year: 2023 ident: 60762_CR5 publication-title: ACM Comput. Surv. – volume: 9 year: 2020 ident: 60762_CR17 publication-title: eLife doi: 10.7554/eLife.58227 – ident: 60762_CR27 – ident: 60762_CR34 doi: 10.48550/arXiv.1802.03426 – ident: 60762_CR40 doi: 10.5281/zenodo.15683658 – volume: 17 start-page: 9467 year: 2020 ident: 60762_CR13 publication-title: Int. J. Environ. Res. Public. Health doi: 10.3390/ijerph17249467 – volume: 11 start-page: 1268223 year: 2023 ident: 60762_CR16 publication-title: Front. Public Health doi: 10.3389/fpubh.2023.1268223 – volume: 8 start-page: 100171 year: 2021 ident: 60762_CR24 publication-title: Lancet Reg. Health Eur. doi: 10.1016/j.lanepe.2021.100171 – ident: 60762_CR36 – volume: 10 start-page: 2270 year: 2022 ident: 60762_CR20 publication-title: Healthcare doi: 10.3390/healthcare10112270 – volume: 73 start-page: S170 year: 2021 ident: 60762_CR1 publication-title: Clin. Infect. Dis. doi: 10.1093/cid/ciaa1442 – volume: 63 start-page: 655 year: 2004 ident: 60762_CR37 publication-title: Proc. Nutr. Soc. doi: 10.1079/PNS2004399 – ident: 60762_CR38 – ident: 60762_CR9 doi: 10.48550/arXiv.2203.05794 – ident: 60762_CR32 doi: 10.1145/1143844.1143967 – ident: 60762_CR30 doi: 10.1145/2939672.2939778 – ident: 60762_CR7 doi: 10.48550/arXiv.2302.13971 – volume: 15 start-page: e0244477 year: 2020 ident: 60762_CR21 publication-title: PloS One doi: 10.1371/journal.pone.0244477 – ident: 60762_CR33 doi: 10.18653/v1/D19-1410 – volume: 16 start-page: 354 year: 2009 ident: 60762_CR12 publication-title: J. Am. Med. Inform. Assoc. JAMIA doi: 10.1197/jamia.M2922 – volume: 13 start-page: 100278 year: 2022 ident: 60762_CR25 publication-title: Lancet Reg. Health Eur. doi: 10.1016/j.lanepe.2021.100278 – volume: 73 start-page: 349 year: 2009 ident: 60762_CR3 publication-title: Public Opin. Q. doi: 10.1093/poq/nfp031 – volume: 10 year: 2022 ident: 60762_CR11 publication-title: JMIR Med. Inform. doi: 10.2196/37771 – volume: 29 start-page: 136 year: 2022 ident: 60762_CR4 publication-title: Assessment doi: 10.1177/1073191120957102 – volume: 290 start-page: 1062 year: 2022 ident: 60762_CR23 publication-title: Stud. Health Technol. Inform. – ident: 60762_CR28 doi: 10.48550/ARXIV.1911.03894 – volume: 10 start-page: 5:1 year: 2015 ident: 60762_CR35 publication-title: ACM Trans. Knowl. Discov. Data doi: 10.1145/2733381 – ident: 60762_CR31 doi: 10.1201/9781420059458.ch4 – volume: 6 start-page: 100401 year: 2023 ident: 60762_CR15 publication-title: Public Health Pr. |
SSID | ssj0000391844 |
Score | 2.4737256 |
Snippet | Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language... Abstract Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large... |
SourceID | doaj pubmedcentral hal proquest pubmed crossref springer |
SourceType | Open Website Open Access Repository Aggregation Database Index Database Publisher |
StartPage | 5836 |
SubjectTerms | 631/114/1305 631/326/596/4130 692/499 692/699/255 692/700/478/174 Adult Clusters Computation and Language Computer Science COVID-19 COVID-19 - epidemiology COVID-19 - transmission Disease transmission Document and Text Processing Epidemiology Female France - epidemiology Households Humanities and Social Sciences Humans Infections Infectious diseases Language Large Language Models Life Sciences Male Middle Aged multidisciplinary Natural language processing Pandemics Polls & surveys Public health Questionnaires Questions Santé publique et épidémiologie SARS-CoV-2 Science Science (multidisciplinary) Severe acute respiratory syndrome coronavirus 2 Sociodemographics Surveys Surveys and Questionnaires Viral diseases |
SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrR1Na9RAdJBCwYv4bbSWEcSLhiYzL5nJsS0tS1FPFgoehvmKLWi27Gat_nvfm2S3jaV48ZJAZpIM7_vNvA_G3qKOjVVrXe5tHXIQLuSohJvcFQ7VUemgdZSN_OlzPTuFk7Pq7EarL4oJG8oDD4DbAxusUmXrtSvAhcLFJuCt9NDYVupUvRR13g1nKslg2aDrAmOWTCH13hKSTKDurTX67iK_mmiiVLAf9cs5hUPetjVvh0z-dW6a1NHxQ_ZgtCP5_rD-R-xe7B6z7aGz5O8n7OvRrz7lP3XfuL9Y-NUPsgJRJvB5yw8p-S4vG96TokJE044Zp0QTvMTIKRiE0wYt_05x4ny9p8lT25zlU3Z6fPTlcJaPfRRyX0nR5zJWMbbOS61iVFQ_viraIKWTyjoR0UOKXqFbZGtdCfCgrAYvwFWiVoUOIJ-xrW7exReM2wD4FloZ-GGIqnG1jSV6hF56kEGJjL1fw9RcDuUyTDrmltoMGDCIAZMwYK4ydkBg38ykUtfpARKAGQnA_IsAMvYOkTb5xmz_o7m0yCWrhaEaZGTW_SwztrPGqxkZdWlQnqFL18gSl_5mM4yQp3MT28X5apgjUVaBztjzgQw2v4PUElFCxvSEQCbrmY50F-epjHeJvIDOZpOxD2taul7X3UB7-T-A9ordF8QKKe54h231i1V8jdZV73YTI_0B26siXw priority: 102 providerName: Directory of Open Access Journals – databaseName: Springer Nature OA Free Journals dbid: C6C link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Nb9UwDLfGEBIXxDcdAwUJcYGKNnHb9DieNj0h4MSkSRyiJE23SaNveh9s_PfYaftQGRy4tFKTtlFsx3Zi_wzwmnRsKFrrUm_LJkXpmpSUcJ26zJE6yh22jrORP38p58f48aQ42QE55sLEoP0IaRmX6TE67P0Ko0hz8dWSXG-ZXt2C2wzdzlw9K2fbfRVGPNeIQ35MpvRfXp3ooAjVT5rljAMhb1qZN4Ml_zgxjYro6D7cGyxIcdCP-QHshO4h3OlrSv58BN8Or9cx86k7Ff586Tff2f6j1UAsWjHjtLs0r8WaVRSRmPfKBKeY0CUEwWEggrdmxQVHiItxN1PEgjmrx3B8dPh1Nk-HCgqpL5RcpyoUIbTOK12FUDFyfJG1jVJOVdbJQL5R8BU5RLbUhUSPldXoJbpCllWmG1RPYLdbdOEZCNsgvUX2BX0YQ1W70oacfEGvPKqmkgm8HefUXPZAGSYecCttegoYooCJFDBXCXzgad_2ZJDr-GCxPDUD0Q3axlZV3nrtMnRN5kLd0C33WNtW6SKBN0S0yTfmB5_MpSX52CwNo4-xQfcjT2B_pKsZRHRlaCUjZ65WOQ391baZZp5PTGwXFpu-j6JVCnUCT3s22P4OYzFEhQnoCYNMxjNt6c7PIoB3TlJAbmadwLuRl36P69-Ttvd_3Z_DXclMH2OL92F3vdyEF2RBrd3LKDK_ABcDFrA priority: 102 providerName: Springer Nature |
Title | Extracting circumstances of Covid-19 transmission from free text with large language models |
URI | https://link.springer.com/article/10.1038/s41467-025-60762-w https://www.ncbi.nlm.nih.gov/pubmed/40593634 https://www.proquest.com/docview/3226269312 https://www.proquest.com/docview/3226353148 https://pasteur.hal.science/pasteur-05172560 https://pubmed.ncbi.nlm.nih.gov/PMC12219669 https://doaj.org/article/4ada771fc8b04bd0be9dbd01c49af385 |
Volume | 16 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfR1ri9QwcNA7BEFEz1f1XCKIX7Rc26RN-kn2ll2XRQ9RDxb8EJI0vTvQ7roPT_-9M2l3j_XQLw206SvznswD4CXKWJ_XxsbOFFUsMlvFKITL2CYWxVFqRW0pG_nDSTE-FZNpPu0cbssurHLDEwOjrmaOfORHiHioe5c8zd7Of8TUNYp2V7sWGjdhn0qXUUiXnMqtj4WqnyshulyZhKujpQicgXq4FmjBZ_HljjwKZftRypxTUOR1jfN64ORfu6dBKI3uwd1Om2T9Fvz34YZvDuBW21_y9wHcaZ1yrM01egBfh79WISuqOWPuYuHW30k3RE7BZjUbUEpenJZsReILwU9-NEbpJ3jwnlGICCO3LftG0eNs4-lkoZnO8iGcjoZfBuO4664Qu5xnq5j73PvaOq6k95KqyudJXXFuuTQ282g3eSfRWDKFyjPhhDRKuEzYPCtkoirBH8FeM2v8E2CmEngX6h74YOFlaQvjU7QTHXeCVzKL4PVmjfW8LaKhw-Y3V7qFiEaI6AARfRnBMYFhO5MKYIcTs8WZ7uhJC1MZKdPaKZsIWyXWlxUOqROlqbnKI3iFQNx5xrj_Xs8N0s56oakyGSl7P9MIDjdw1h35LvUVskXwYnsZV552U0zjZ-t2DkcOJlQEj1u02L5OhEaJXESgdhBm53t2rzQX56G4d4oUgiZoGcGbDW5dfde_F-3p_3_jGdzOCOlDnPEh7K0Wa_8ctamV7QWSwaMavevBfr8_-TzB8Xh48vETnh0Ug17wU_wBgD8jJQ |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR3LbtQwcFSKEEgIQXkFChgJuEDUJHYS54BQKV22dNtTK1XqwdiO01aCZNkHS3-Kb2TG2Wy1VHDrJZHixEk87_E8AF6hjHVppU1odVaGIjFliEK4CE1kUBzFRlSGspH39rP-ofhylB6twO8uF4bCKjue6Bl12VjykW8g4qHuXfA4-TD8EVLXKNpd7VpotGix685naLKN3-98Qvi-TpLe9sFWP5x3FQhtypNJyF3qXGUsl7lzOVVTT6Oq5NzwXJvEob3gbI5Ggs5kmggrci2FTYRJkyyPZCk4znsNrgvOC6Io2fu88OlQtXUpxDw3J-JyYyw8J6KesVmEfCecLck_3yYApdopBWFe1nAvB2r-tVvrhWDvLtyZa69ss0W3e7Di6jW40fazPF-D260TkLW5TffhePvXxGdh1SfMno3s9DvposiZWFOxLUoBDOOCTUhcIrqR345RugsenGMUksLITcy-UbQ66zyrzDfvGT-AwytZ94ewWje1ewxMlwKfQl0HJxYuL0ymXYx2qeVW8DJPAnjbrbEatkU7lN9s51K1EFEIEeUhomYBfCQwLO6kgtv-QjM6UXP6VUKXOs_jykoTCVNGxhUlnmIrCl1xmQbwBoG4NEd_c6CGGml1OlJUCY2Uy59xAOsdnNWcXYzVBXIH8HIxjCtPuze6ds20vYcjxxQygEctWixeJ3xjRi4CkEsIs_Q9yyP12akvJh4jRaLJWwTwrsOti-_696I9-f9vvICb_YO9gRrs7O8-hVsJEYCPcV6H1clo6p6hJjcxzz35MPh61fT6Bwt_WVk |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFD4aQyAkhGDcAgOMBLxA1CR24uQBobGt6tiYeGBSJR6M7TjbJEhLL5T9NX4d5zhJpzLB215aqUnd1Oc7V58LwAvUsS6ttAmtzspQJKYMUQkXoYkMqqPYiMpQNfLHw2xwJD4M0-Ea_O5qYSitspOJXlCXI0sx8h4CD23vgsdJr2rTIj7t9N-Nf4Q0QYpOWrtxGg1E9t3ZAt236du9HaT1yyTp737eHoTthIHQpjyZhdylzlXG8lw6J6mzehpVJeeGS20Sh76DsxIdBp3laSKskDoXNhEmTTIZ5aXguO4VuCp5GhOPyaFcxneo83ouRFunE_G8NxVeKtH82CxCGRQuVnShHxmAGu6EEjIvWrsXkzb_Orn1CrF_G261lizbaqB3B9ZcvQHXmtmWZxtwswkIsqbO6S582f018xVZ9TGzpxM7_052KUopNqrYNpUDhnHBZqQ6EXoUw2NU-oIvzjGiA6OQMftGmeusi7IyP8hneg-OLmXf78N6PardQ2C6FPgttHtwYeFkYTLtYvRRLbeClzIJ4HW3x2rcNPBQ_uCd56qhiEKKKE8RtQjgPZFheSc13_YfjCbHquVlJXSppYwrm5tImDIyrijxLbai0BXP0wBeIRFX1hhsHaixRr6dTxR1RSND82ccwGZHZ9WKjqk6B3oAz5eXcefpJEfXbjRv7uEoPUUewIMGFsufE35IIxcB5CuAWXme1Sv16YlvLB4jd6L7WwTwpsPW-XP9e9Me_f9vPIPryKnqYO9w_zHcSAj_Pt15E9Znk7l7gkbdzDz13MPg62Wz6x8deV2G |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Extracting+circumstances+of+Covid-19+transmission+from+free+text+with+large+language+models&rft.jtitle=Nature+communications&rft.au=Bizel-Bizellot%2C+Gaston&rft.au=Galmiche%2C+Simon&rft.au=Lelandais%2C+Beno%C3%AEt&rft.au=Charmet%2C+Tiffany&rft.date=2025-07-01&rft.pub=Nature+Publishing+Group&rft.eissn=2041-1723&rft.volume=16&rft.issue=1&rft.spage=5836&rft_id=info:doi/10.1038%2Fs41467-025-60762-w&rft.externalDBID=HAS_PDF_LINK |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2041-1723&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2041-1723&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2041-1723&client=summon |