Extracting circumstances of Covid-19 transmission from free text with large language models

Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a na...

Full description

Saved in:
Bibliographic Details
Published inNature communications Vol. 16; no. 1; pp. 5836 - 13
Main Authors Bizel-Bizellot, Gaston, Galmiche, Simon, Lelandais, Benoît, Charmet, Tiffany, Coudeville, Laurent, Fontanet, Arnaud, Zimmer, Christophe
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 01.07.2025
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics. Open-ended survey questions may provide useful detail on possible venues of transmission of infectious diseases, but data are difficult to analyse at scale. Here, the authors use large language models to extract potential transmission venues in ~80,000 responses to an open-ended COVID-19 survey question in France.
AbstractList Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.Open-ended survey questions may provide useful detail on possible venues of transmission of infectious diseases, but data are difficult to analyse at scale. Here, the authors use large language models to extract potential transmission venues in ~80,000 responses to an open-ended COVID-19 survey question in France.
Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics. Open-ended survey questions may provide useful detail on possible venues of transmission of infectious diseases, but data are difficult to analyse at scale. Here, the authors use large language models to extract potential transmission venues in ~80,000 responses to an open-ended COVID-19 survey question in France.
Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.
Abstract Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.
Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.
Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language models (LLMs) can automatically extract such circumstances from free-text descriptions in online surveys, in the context of Covid-19. In a nationwide study conducted online in France, we enrolled 545,958 adults with recent SARS-CoV-2 infection and inquired about the circumstances of transmission in both closed-ended and open-ended questions. First, we trained a classification model based on a pretrained LLM to predict one of seven predefined infection contexts (Work, Family, Friends, Sports, Cultural, Religious, Other) from the free text in answers to open-ended questions. We achieved an unbalanced accuracy of 75%, which increased to 91% when eliminating the 43% highest entropy responses. Second, we used topic modeling to define clusters of transmission circumstances agnostically. This led to 23 clusters, which agreed with the seven predefined infection contexts, but also provided finer details on previously undefined circumstances of transmission. Our study suggests that LLM-based analysis of free text may alleviate the need for closed-ended questions in epidemiological surveys and enable insights into previously unsuspected circumstances of transmission. This approach is poised to accelerate and enrich the acquisition of epidemiological insights in future pandemics.
ArticleNumber 5836
Author Fontanet, Arnaud
Bizel-Bizellot, Gaston
Galmiche, Simon
Charmet, Tiffany
Zimmer, Christophe
Coudeville, Laurent
Lelandais, Benoît
Author_xml – sequence: 1
  givenname: Gaston
  surname: Bizel-Bizellot
  fullname: Bizel-Bizellot, Gaston
  organization: Institut Pasteur, Université Paris Cité, Imaging and Modeling Unit
– sequence: 2
  givenname: Simon
  surname: Galmiche
  fullname: Galmiche, Simon
  organization: Institut Pasteur, Université Paris Cité, Epidemiology of Emerging Diseases Unit, McGill University, Centre for Clinical Epidemiology, Lady Davis Institute for Medical Research, Jewish General Hospital
– sequence: 3
  givenname: Benoît
  orcidid: 0000-0003-0321-4015
  surname: Lelandais
  fullname: Lelandais, Benoît
  organization: Institut Pasteur, Université Paris Cité, Imaging and Modeling Unit, Institut Pasteur, Université Paris Cité, Image Analysis Hub
– sequence: 4
  givenname: Tiffany
  surname: Charmet
  fullname: Charmet, Tiffany
  organization: Institut Pasteur, Université Paris Cité, Epidemiology of Emerging Diseases Unit
– sequence: 5
  givenname: Laurent
  orcidid: 0000-0003-3651-2959
  surname: Coudeville
  fullname: Coudeville, Laurent
  organization: Sanofi Vaccines, Global Medical
– sequence: 6
  givenname: Arnaud
  surname: Fontanet
  fullname: Fontanet, Arnaud
  email: arnaud.fontanet@pasteur.fr
  organization: Institut Pasteur, Université Paris Cité, Epidemiology of Emerging Diseases Unit, Conservatoire national des arts et métiers, PACRI Unit
– sequence: 7
  givenname: Christophe
  orcidid: 0000-0001-9910-1589
  surname: Zimmer
  fullname: Zimmer, Christophe
  email: czimmer@pasteur.fr
  organization: Institut Pasteur, Université Paris Cité, Imaging and Modeling Unit, University of Würzburg, Rudolf Virchow Center for integrative and translational bioimaging, University of Würzburg, Center for Artificial Intelligence and Data Science (CAIDAS)
BackLink https://www.ncbi.nlm.nih.gov/pubmed/40593634$$D View this record in MEDLINE/PubMed
https://pasteur.hal.science/pasteur-05172560$$DView record in HAL
BookMark eNp9kk1v1DAQhiNUREvpH-CAInHhEvC3kxOqVoVWWokLnDhYjjPJZpXYi53stv-eYVNK2wM-eEb2M-_YM_M6O_HBQ5a9peQjJbz8lAQVSheEyUIRrVhxeJGdMSJoQTXjJ4_80-wipS3BxStaCvEqOxVEVlxxcZb9vLqdonVT77vc9dHNY5qsd5Dy0OarsO-bglY5Ij6NfUp98Hkbw4gbQD7B7ZQf-mmTDzZ2gLvvZovOGBoY0pvsZWuHBBf39jz78eXq--q6WH_7erO6XBdOcjYVHCRAWzteagCty5JK0jac11zbmkElFThNRWlVKZlwQttSOCZqyZQmZSP4eXaz6DbBbs0u9qONdybY3hwPQuyMjVPvBjDCNlZr2rqyJqJuSA1Vg4Y6UdmWlxK1Pi9au7keoXHg8e_DE9GnN77fmC7sDWWMVkpVqFAsCptncdeXa7OzaYI5GiKxM1KRPUX-w33GGH7NkCaDhXYwYDEhzMlwxhSXHAuA6Ptn6DbM0WNtjxTD7JQh9e7xFx7e8LfpCLAFcDGkFKF9QCgxf4bLLMNlcLjMcbjMAYP4EpQQ9h3Ef7n_E_UbC1fRng
Cites_doi 10.1146/annurev-biodatasci-021821-061045
10.1016/j.lanepe.2021.100148
10.2196/19421
10.1016/j.idnow.2024.104943
10.14618/ids-pub-9021
10.48550/arXiv.1907.11692
10.1007/978-3-030-57811-4_16
10.1001/jamanetworkopen.2023.22299
10.48550/ARXIV.1810.04805
10.1097/QAI.0000000000001580
10.48550/arXiv.2303.13375
10.7554/eLife.58227
10.48550/arXiv.1802.03426
10.5281/zenodo.15683658
10.3390/ijerph17249467
10.3389/fpubh.2023.1268223
10.1016/j.lanepe.2021.100171
10.3390/healthcare10112270
10.1093/cid/ciaa1442
10.1079/PNS2004399
10.48550/arXiv.2203.05794
10.1145/1143844.1143967
10.1145/2939672.2939778
10.48550/arXiv.2302.13971
10.1371/journal.pone.0244477
10.18653/v1/D19-1410
10.1197/jamia.M2922
10.1016/j.lanepe.2021.100278
10.1093/poq/nfp031
10.2196/37771
10.1177/1073191120957102
10.48550/ARXIV.1911.03894
10.1145/2733381
10.1201/9781420059458.ch4
ContentType Journal Article
Copyright The Author(s) 2025
2025. The Author(s).
Copyright Nature Publishing Group 2025
Attribution - NonCommercial - NoDerivatives
The Author(s) 2025 2025
Copyright_xml – notice: The Author(s) 2025
– notice: 2025. The Author(s).
– notice: Copyright Nature Publishing Group 2025
– notice: Attribution - NonCommercial - NoDerivatives
– notice: The Author(s) 2025 2025
DBID C6C
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7QL
7QP
7QR
7SN
7SS
7ST
7T5
7T7
7TM
7TO
7X7
7XB
88E
8AO
8FD
8FE
8FG
8FH
8FI
8FJ
8FK
ABUWG
AEUYN
AFKRA
ARAPS
AZQEC
BBNVY
BENPR
BGLVJ
BHPHI
C1K
CCPQU
COVID
DWQXO
FR3
FYUFA
GHDGH
GNUQQ
H94
HCIFZ
K9.
LK8
M0S
M1P
M7P
P5Z
P62
P64
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
RC3
SOI
7X8
1XC
VOOES
5PM
DOA
DOI 10.1038/s41467-025-60762-w
DatabaseName Springer Nature OA Free Journals
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
ProQuest Central (Corporate)
Bacteriology Abstracts (Microbiology B)
Calcium & Calcified Tissue Abstracts
Chemoreception Abstracts
Ecology Abstracts
Entomology Abstracts (Full archive)
Environment Abstracts
Immunology Abstracts
Industrial and Applied Microbiology Abstracts (Microbiology A)
Nucleic Acids Abstracts
Oncogenes and Growth Factors Abstracts
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Medical Database (Alumni Edition)
ProQuest Pharma Collection
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Natural Science Collection
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest One Sustainability
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
ProQuest Technology Collection
Natural Science Collection
Environmental Sciences and Pollution Management
ProQuest One
Coronavirus Research Database
ProQuest Central Korea
Engineering Research Database
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Central Student
AIDS and Cancer Research Abstracts
SciTech Premium Collection
ProQuest Health & Medical Complete (Alumni)
ProQuest Biological Science Collection
Health & Medical Collection (Alumni Edition)
Medical Database
Biological Science Database
ProQuest advanced technologies & aerospace journals
ProQuest Advanced Technologies & Aerospace Collection
Biotechnology and BioEngineering Abstracts
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Genetics Abstracts
Environment Abstracts
MEDLINE - Academic
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Publicly Available Content Database
ProQuest Central Student
Oncogenes and Growth Factors Abstracts
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
Nucleic Acids Abstracts
SciTech Premium Collection
ProQuest Central China
Environmental Sciences and Pollution Management
ProQuest One Applied & Life Sciences
ProQuest One Sustainability
Health Research Premium Collection
Natural Science Collection
Health & Medical Research Collection
Biological Science Collection
Chemoreception Abstracts
Industrial and Applied Microbiology Abstracts (Microbiology A)
ProQuest Central (New)
ProQuest Medical Library (Alumni)
Advanced Technologies & Aerospace Collection
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
Coronavirus Research Database
ProQuest Hospital Collection
ProQuest Technology Collection
Health Research Premium Collection (Alumni)
Biological Science Database
Ecology Abstracts
ProQuest Hospital Collection (Alumni)
Biotechnology and BioEngineering Abstracts
Entomology Abstracts
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
Engineering Research Database
ProQuest One Academic
Calcium & Calcified Tissue Abstracts
ProQuest One Academic (New)
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Natural Science Collection
ProQuest Pharma Collection
ProQuest Central
ProQuest Health & Medical Research Collection
Genetics Abstracts
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Bacteriology Abstracts (Microbiology B)
AIDS and Cancer Research Abstracts
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest Medical Library
Immunology Abstracts
Environment Abstracts
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList Publicly Available Content Database

MEDLINE - Academic



MEDLINE
Database_xml – sequence: 1
  dbid: C6C
  name: Springer Nature OA Free Journals
  url: http://www.springeropen.com/
  sourceTypes: Publisher
– sequence: 2
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 3
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 4
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 5
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
Public Health
Computer Science
EISSN 2041-1723
EndPage 13
ExternalDocumentID oai_doaj_org_article_4ada771fc8b04bd0be9dbd01c49af385
PMC12219669
oai_HAL_pasteur_05172560v1
40593634
10_1038_s41467_025_60762_w
Genre Journal Article
GeographicLocations France
GeographicLocations_xml – name: France
GrantInformation_xml – fundername: Institut Pasteur
  funderid: https://doi.org/10.13039/501100003762
GroupedDBID ---
0R~
39C
53G
5VS
70F
7X7
88E
8AO
8FE
8FG
8FH
8FI
8FJ
AAHBH
AAJSJ
AASML
ABUWG
ACGFO
ACGFS
ACIWK
ACMJI
ACPRK
ADBBV
ADFRT
ADMLS
ADRAZ
AENEX
AEUYN
AFKRA
AFRAH
AHMBA
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AMTXH
AOIJS
ARAPS
ASPBG
AVWKF
AZFZN
BBNVY
BCNDV
BENPR
BGLVJ
BHPHI
BPHCQ
BVXVI
C6C
CCPQU
DIK
EBLON
EBS
EE.
EMOBN
F5P
FEDTE
FYUFA
GROUPED_DOAJ
HCIFZ
HMCUK
HVGLF
HYE
HZ~
KQ8
LGEZI
LK8
LOTEE
M1P
M7P
M~E
NADUK
NAO
NXXTH
O9-
OK1
P2P
P62
PHGZM
PHGZT
PIMPY
PQQKQ
PROAC
PSQYO
RNS
RNT
RNTTT
RPM
SNYQT
SV3
TSG
UKHRP
AAYXX
CITATION
PPXIY
PQGLB
CGR
CUY
CVF
ECM
EIF
NPM
3V.
7QL
7QP
7QR
7SN
7SS
7ST
7T5
7T7
7TM
7TO
7XB
8FD
8FK
AZQEC
C1K
COVID
DWQXO
FR3
GNUQQ
H94
K9.
M48
P64
PJZUB
PKEHL
PQEST
PQUKI
PRINS
RC3
SOI
7X8
PUEGO
1XC
AARCD
VOOES
5PM
ID FETCH-LOGICAL-c532t-3e5eefbc387ee7788150fd33b37ab2e956ec7148a68524c47a84c24b526708d43
IEDL.DBID 7X7
ISSN 2041-1723
IngestDate Wed Aug 27 01:31:07 EDT 2025
Thu Aug 21 18:33:25 EDT 2025
Fri Aug 08 06:20:36 EDT 2025
Tue Aug 26 08:59:30 EDT 2025
Sat Aug 23 12:58:58 EDT 2025
Mon Jul 07 01:54:48 EDT 2025
Thu Jul 10 08:36:03 EDT 2025
Wed Jul 02 02:44:44 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License 2025. The Author(s).
Attribution - NonCommercial - NoDerivatives: http://creativecommons.org/licenses/by-nc-nd
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c532t-3e5eefbc387ee7788150fd33b37ab2e956ec7148a68524c47a84c24b526708d43
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
PMCID: PMC12219669
ORCID 0000-0001-9910-1589
0000-0003-3651-2959
0000-0003-0321-4015
0000-0002-1691-1744
OpenAccessLink https://www.proquest.com/docview/3226269312?pq-origsite=%requestingapplication%
PMID 40593634
PQID 3226269312
PQPubID 546298
PageCount 13
ParticipantIDs doaj_primary_oai_doaj_org_article_4ada771fc8b04bd0be9dbd01c49af385
pubmedcentral_primary_oai_pubmedcentral_nih_gov_12219669
hal_primary_oai_HAL_pasteur_05172560v1
proquest_miscellaneous_3226353148
proquest_journals_3226269312
pubmed_primary_40593634
crossref_primary_10_1038_s41467_025_60762_w
springer_journals_10_1038_s41467_025_60762_w
PublicationCentury 2000
PublicationDate 2025-07-01
PublicationDateYYYYMMDD 2025-07-01
PublicationDate_xml – month: 07
  year: 2025
  text: 2025-07-01
  day: 01
PublicationDecade 2020
PublicationPlace London
PublicationPlace_xml – name: London
– name: England
PublicationTitle Nature communications
PublicationTitleAbbrev Nat Commun
PublicationTitleAlternate Nat Commun
PublicationYear 2025
Publisher Nature Publishing Group UK
Nature Publishing Group
Nature Portfolio
Publisher_xml – name: Nature Publishing Group UK
– name: Nature Publishing Group
– name: Nature Portfolio
References R Grant (60762_CR25) 2022; 13
MA Al-Garadi (60762_CR20) 2022; 10
60762_CR40
Q Chen (60762_CR18) 2021; 4
C Shen (60762_CR19) 2020; 22
60762_CR27
60762_CR28
60762_CR29
60762_CR6
C Perrey (60762_CR26) 2024; 54
KL Schwartz (60762_CR21) 2020; 15
60762_CR8
S Galmiche (60762_CR2) 2021; 7
60762_CR7
M Galesic (60762_CR3) 2009; 73
60762_CR9
DJ Feller (60762_CR22) 2018; 77
L Towler (60762_CR16) 2023; 11
PM Heider (60762_CR23) 2022; 290
G Eisele (60762_CR4) 2022; 29
RJGB Campello (60762_CR35) 2015; 10
M Kim (60762_CR13) 2020; 17
60762_CR30
60762_CR31
60762_CR10
60762_CR32
60762_CR33
60762_CR34
60762_CR36
60762_CR38
60762_CR39
JV Olmen (60762_CR11) 2022; 10
P Bondaronek (60762_CR15) 2023; 6
G Hripcsak (60762_CR12) 2009; 16
T Wagner (60762_CR17) 2020; 9
B Min (60762_CR5) 2023; 56
M Cevik (60762_CR1) 2021; 73
F Rabiee (60762_CR37) 2004; 63
K Mermin-Bunnell (60762_CR14) 2023; 6
T Charmet (60762_CR24) 2021; 8
References_xml – volume: 4
  start-page: 313
  year: 2021
  ident: 60762_CR18
  publication-title: Annu. Rev. Biomed. Data Sci.
  doi: 10.1146/annurev-biodatasci-021821-061045
– volume: 7
  start-page: 100148
  year: 2021
  ident: 60762_CR2
  publication-title: Lancet Reg. Health Eur.
  doi: 10.1016/j.lanepe.2021.100148
– volume: 22
  year: 2020
  ident: 60762_CR19
  publication-title: J. Med. Internet Res.
  doi: 10.2196/19421
– volume: 54
  year: 2024
  ident: 60762_CR26
  publication-title: Infect. Dis. Now.
  doi: 10.1016/j.idnow.2024.104943
– ident: 60762_CR29
  doi: 10.14618/ids-pub-9021
– ident: 60762_CR8
  doi: 10.48550/arXiv.1907.11692
– ident: 60762_CR10
  doi: 10.1007/978-3-030-57811-4_16
– volume: 6
  year: 2023
  ident: 60762_CR14
  publication-title: JAMA Netw. Open
  doi: 10.1001/jamanetworkopen.2023.22299
– ident: 60762_CR6
  doi: 10.48550/ARXIV.1810.04805
– volume: 77
  start-page: 160
  year: 2018
  ident: 60762_CR22
  publication-title: J. Acquir. Immune Defic. Syndr. 1999
  doi: 10.1097/QAI.0000000000001580
– ident: 60762_CR39
  doi: 10.48550/arXiv.2303.13375
– volume: 56
  start-page: 30:1
  year: 2023
  ident: 60762_CR5
  publication-title: ACM Comput. Surv.
– volume: 9
  year: 2020
  ident: 60762_CR17
  publication-title: eLife
  doi: 10.7554/eLife.58227
– ident: 60762_CR27
– ident: 60762_CR34
  doi: 10.48550/arXiv.1802.03426
– ident: 60762_CR40
  doi: 10.5281/zenodo.15683658
– volume: 17
  start-page: 9467
  year: 2020
  ident: 60762_CR13
  publication-title: Int. J. Environ. Res. Public. Health
  doi: 10.3390/ijerph17249467
– volume: 11
  start-page: 1268223
  year: 2023
  ident: 60762_CR16
  publication-title: Front. Public Health
  doi: 10.3389/fpubh.2023.1268223
– volume: 8
  start-page: 100171
  year: 2021
  ident: 60762_CR24
  publication-title: Lancet Reg. Health Eur.
  doi: 10.1016/j.lanepe.2021.100171
– ident: 60762_CR36
– volume: 10
  start-page: 2270
  year: 2022
  ident: 60762_CR20
  publication-title: Healthcare
  doi: 10.3390/healthcare10112270
– volume: 73
  start-page: S170
  year: 2021
  ident: 60762_CR1
  publication-title: Clin. Infect. Dis.
  doi: 10.1093/cid/ciaa1442
– volume: 63
  start-page: 655
  year: 2004
  ident: 60762_CR37
  publication-title: Proc. Nutr. Soc.
  doi: 10.1079/PNS2004399
– ident: 60762_CR38
– ident: 60762_CR9
  doi: 10.48550/arXiv.2203.05794
– ident: 60762_CR32
  doi: 10.1145/1143844.1143967
– ident: 60762_CR30
  doi: 10.1145/2939672.2939778
– ident: 60762_CR7
  doi: 10.48550/arXiv.2302.13971
– volume: 15
  start-page: e0244477
  year: 2020
  ident: 60762_CR21
  publication-title: PloS One
  doi: 10.1371/journal.pone.0244477
– ident: 60762_CR33
  doi: 10.18653/v1/D19-1410
– volume: 16
  start-page: 354
  year: 2009
  ident: 60762_CR12
  publication-title: J. Am. Med. Inform. Assoc. JAMIA
  doi: 10.1197/jamia.M2922
– volume: 13
  start-page: 100278
  year: 2022
  ident: 60762_CR25
  publication-title: Lancet Reg. Health Eur.
  doi: 10.1016/j.lanepe.2021.100278
– volume: 73
  start-page: 349
  year: 2009
  ident: 60762_CR3
  publication-title: Public Opin. Q.
  doi: 10.1093/poq/nfp031
– volume: 10
  year: 2022
  ident: 60762_CR11
  publication-title: JMIR Med. Inform.
  doi: 10.2196/37771
– volume: 29
  start-page: 136
  year: 2022
  ident: 60762_CR4
  publication-title: Assessment
  doi: 10.1177/1073191120957102
– volume: 290
  start-page: 1062
  year: 2022
  ident: 60762_CR23
  publication-title: Stud. Health Technol. Inform.
– ident: 60762_CR28
  doi: 10.48550/ARXIV.1911.03894
– volume: 10
  start-page: 5:1
  year: 2015
  ident: 60762_CR35
  publication-title: ACM Trans. Knowl. Discov. Data
  doi: 10.1145/2733381
– ident: 60762_CR31
  doi: 10.1201/9781420059458.ch4
– volume: 6
  start-page: 100401
  year: 2023
  ident: 60762_CR15
  publication-title: Public Health Pr.
SSID ssj0000391844
Score 2.4737256
Snippet Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large language...
Abstract Identifying the circumstances of transmission of an emerging infectious disease rapidly is central for mitigation efforts. Here, we explore how large...
SourceID doaj
pubmedcentral
hal
proquest
pubmed
crossref
springer
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Publisher
StartPage 5836
SubjectTerms 631/114/1305
631/326/596/4130
692/499
692/699/255
692/700/478/174
Adult
Clusters
Computation and Language
Computer Science
COVID-19
COVID-19 - epidemiology
COVID-19 - transmission
Disease transmission
Document and Text Processing
Epidemiology
Female
France - epidemiology
Households
Humanities and Social Sciences
Humans
Infections
Infectious diseases
Language
Large Language Models
Life Sciences
Male
Middle Aged
multidisciplinary
Natural language processing
Pandemics
Polls & surveys
Public health
Questionnaires
Questions
Santé publique et épidémiologie
SARS-CoV-2
Science
Science (multidisciplinary)
Severe acute respiratory syndrome coronavirus 2
Sociodemographics
Surveys
Surveys and Questionnaires
Viral diseases
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrR1Na9RAdJBCwYv4bbSWEcSLhiYzL5nJsS0tS1FPFgoehvmKLWi27Gat_nvfm2S3jaV48ZJAZpIM7_vNvA_G3qKOjVVrXe5tHXIQLuSohJvcFQ7VUemgdZSN_OlzPTuFk7Pq7EarL4oJG8oDD4DbAxusUmXrtSvAhcLFJuCt9NDYVupUvRR13g1nKslg2aDrAmOWTCH13hKSTKDurTX67iK_mmiiVLAf9cs5hUPetjVvh0z-dW6a1NHxQ_ZgtCP5_rD-R-xe7B6z7aGz5O8n7OvRrz7lP3XfuL9Y-NUPsgJRJvB5yw8p-S4vG96TokJE044Zp0QTvMTIKRiE0wYt_05x4ny9p8lT25zlU3Z6fPTlcJaPfRRyX0nR5zJWMbbOS61iVFQ_viraIKWTyjoR0UOKXqFbZGtdCfCgrAYvwFWiVoUOIJ-xrW7exReM2wD4FloZ-GGIqnG1jSV6hF56kEGJjL1fw9RcDuUyTDrmltoMGDCIAZMwYK4ydkBg38ykUtfpARKAGQnA_IsAMvYOkTb5xmz_o7m0yCWrhaEaZGTW_SwztrPGqxkZdWlQnqFL18gSl_5mM4yQp3MT28X5apgjUVaBztjzgQw2v4PUElFCxvSEQCbrmY50F-epjHeJvIDOZpOxD2taul7X3UB7-T-A9ordF8QKKe54h231i1V8jdZV73YTI_0B26siXw
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: Springer Nature OA Free Journals
  dbid: C6C
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Nb9UwDLfGEBIXxDcdAwUJcYGKNnHb9DieNj0h4MSkSRyiJE23SaNveh9s_PfYaftQGRy4tFKTtlFsx3Zi_wzwmnRsKFrrUm_LJkXpmpSUcJ26zJE6yh22jrORP38p58f48aQ42QE55sLEoP0IaRmX6TE67P0Ko0hz8dWSXG-ZXt2C2wzdzlw9K2fbfRVGPNeIQ35MpvRfXp3ooAjVT5rljAMhb1qZN4Ml_zgxjYro6D7cGyxIcdCP-QHshO4h3OlrSv58BN8Or9cx86k7Ff586Tff2f6j1UAsWjHjtLs0r8WaVRSRmPfKBKeY0CUEwWEggrdmxQVHiItxN1PEgjmrx3B8dPh1Nk-HCgqpL5RcpyoUIbTOK12FUDFyfJG1jVJOVdbJQL5R8BU5RLbUhUSPldXoJbpCllWmG1RPYLdbdOEZCNsgvUX2BX0YQ1W70oacfEGvPKqmkgm8HefUXPZAGSYecCttegoYooCJFDBXCXzgad_2ZJDr-GCxPDUD0Q3axlZV3nrtMnRN5kLd0C33WNtW6SKBN0S0yTfmB5_MpSX52CwNo4-xQfcjT2B_pKsZRHRlaCUjZ65WOQ391baZZp5PTGwXFpu-j6JVCnUCT3s22P4OYzFEhQnoCYNMxjNt6c7PIoB3TlJAbmadwLuRl36P69-Ttvd_3Z_DXclMH2OL92F3vdyEF2RBrd3LKDK_ABcDFrA
  priority: 102
  providerName: Springer Nature
Title Extracting circumstances of Covid-19 transmission from free text with large language models
URI https://link.springer.com/article/10.1038/s41467-025-60762-w
https://www.ncbi.nlm.nih.gov/pubmed/40593634
https://www.proquest.com/docview/3226269312
https://www.proquest.com/docview/3226353148
https://pasteur.hal.science/pasteur-05172560
https://pubmed.ncbi.nlm.nih.gov/PMC12219669
https://doaj.org/article/4ada771fc8b04bd0be9dbd01c49af385
Volume 16
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfR1ri9QwcNA7BEFEz1f1XCKIX7Rc26RN-kn2ll2XRQ9RDxb8EJI0vTvQ7roPT_-9M2l3j_XQLw206SvznswD4CXKWJ_XxsbOFFUsMlvFKITL2CYWxVFqRW0pG_nDSTE-FZNpPu0cbssurHLDEwOjrmaOfORHiHioe5c8zd7Of8TUNYp2V7sWGjdhn0qXUUiXnMqtj4WqnyshulyZhKujpQicgXq4FmjBZ_HljjwKZftRypxTUOR1jfN64ORfu6dBKI3uwd1Om2T9Fvz34YZvDuBW21_y9wHcaZ1yrM01egBfh79WISuqOWPuYuHW30k3RE7BZjUbUEpenJZsReILwU9-NEbpJ3jwnlGICCO3LftG0eNs4-lkoZnO8iGcjoZfBuO4664Qu5xnq5j73PvaOq6k95KqyudJXXFuuTQ282g3eSfRWDKFyjPhhDRKuEzYPCtkoirBH8FeM2v8E2CmEngX6h74YOFlaQvjU7QTHXeCVzKL4PVmjfW8LaKhw-Y3V7qFiEaI6AARfRnBMYFhO5MKYIcTs8WZ7uhJC1MZKdPaKZsIWyXWlxUOqROlqbnKI3iFQNx5xrj_Xs8N0s56oakyGSl7P9MIDjdw1h35LvUVskXwYnsZV552U0zjZ-t2DkcOJlQEj1u02L5OhEaJXESgdhBm53t2rzQX56G4d4oUgiZoGcGbDW5dfde_F-3p_3_jGdzOCOlDnPEh7K0Wa_8ctamV7QWSwaMavevBfr8_-TzB8Xh48vETnh0Ug17wU_wBgD8jJQ
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtR3LbtQwcFSKEEgIQXkFChgJuEDUJHYS54BQKV22dNtTK1XqwdiO01aCZNkHS3-Kb2TG2Wy1VHDrJZHixEk87_E8AF6hjHVppU1odVaGIjFliEK4CE1kUBzFRlSGspH39rP-ofhylB6twO8uF4bCKjue6Bl12VjykW8g4qHuXfA4-TD8EVLXKNpd7VpotGix685naLKN3-98Qvi-TpLe9sFWP5x3FQhtypNJyF3qXGUsl7lzOVVTT6Oq5NzwXJvEob3gbI5Ggs5kmggrci2FTYRJkyyPZCk4znsNrgvOC6Io2fu88OlQtXUpxDw3J-JyYyw8J6KesVmEfCecLck_3yYApdopBWFe1nAvB2r-tVvrhWDvLtyZa69ss0W3e7Di6jW40fazPF-D260TkLW5TffhePvXxGdh1SfMno3s9DvposiZWFOxLUoBDOOCTUhcIrqR345RugsenGMUksLITcy-UbQ66zyrzDfvGT-AwytZ94ewWje1ewxMlwKfQl0HJxYuL0ymXYx2qeVW8DJPAnjbrbEatkU7lN9s51K1EFEIEeUhomYBfCQwLO6kgtv-QjM6UXP6VUKXOs_jykoTCVNGxhUlnmIrCl1xmQbwBoG4NEd_c6CGGml1OlJUCY2Uy59xAOsdnNWcXYzVBXIH8HIxjCtPuze6ds20vYcjxxQygEctWixeJ3xjRi4CkEsIs_Q9yyP12akvJh4jRaLJWwTwrsOti-_696I9-f9vvICb_YO9gRrs7O8-hVsJEYCPcV6H1clo6p6hJjcxzz35MPh61fT6Bwt_WVk
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1bb9MwFD4aQyAkhGDcAgOMBLxA1CR24uQBobGt6tiYeGBSJR6M7TjbJEhLL5T9NX4d5zhJpzLB215aqUnd1Oc7V58LwAvUsS6ttAmtzspQJKYMUQkXoYkMqqPYiMpQNfLHw2xwJD4M0-Ea_O5qYSitspOJXlCXI0sx8h4CD23vgsdJr2rTIj7t9N-Nf4Q0QYpOWrtxGg1E9t3ZAt236du9HaT1yyTp737eHoTthIHQpjyZhdylzlXG8lw6J6mzehpVJeeGS20Sh76DsxIdBp3laSKskDoXNhEmTTIZ5aXguO4VuCp5GhOPyaFcxneo83ouRFunE_G8NxVeKtH82CxCGRQuVnShHxmAGu6EEjIvWrsXkzb_Orn1CrF_G261lizbaqB3B9ZcvQHXmtmWZxtwswkIsqbO6S582f018xVZ9TGzpxM7_052KUopNqrYNpUDhnHBZqQ6EXoUw2NU-oIvzjGiA6OQMftGmeusi7IyP8hneg-OLmXf78N6PardQ2C6FPgttHtwYeFkYTLtYvRRLbeClzIJ4HW3x2rcNPBQ_uCd56qhiEKKKE8RtQjgPZFheSc13_YfjCbHquVlJXSppYwrm5tImDIyrijxLbai0BXP0wBeIRFX1hhsHaixRr6dTxR1RSND82ccwGZHZ9WKjqk6B3oAz5eXcefpJEfXbjRv7uEoPUUewIMGFsufE35IIxcB5CuAWXme1Sv16YlvLB4jd6L7WwTwpsPW-XP9e9Me_f9vPIPryKnqYO9w_zHcSAj_Pt15E9Znk7l7gkbdzDz13MPg62Wz6x8deV2G
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Extracting+circumstances+of+Covid-19+transmission+from+free+text+with+large+language+models&rft.jtitle=Nature+communications&rft.au=Bizel-Bizellot%2C+Gaston&rft.au=Galmiche%2C+Simon&rft.au=Lelandais%2C+Beno%C3%AEt&rft.au=Charmet%2C+Tiffany&rft.date=2025-07-01&rft.pub=Nature+Publishing+Group&rft.eissn=2041-1723&rft.volume=16&rft.issue=1&rft.spage=5836&rft_id=info:doi/10.1038%2Fs41467-025-60762-w&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2041-1723&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2041-1723&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2041-1723&client=summon