Simulants: Synthetic Clinical Trial Data via Subject-Level Privacy-Preserving Synthesis

Clinical trials capture high-quality data for millions of patients each year, yet these data are largely unavailable for research beyond the scope of any individual trial due to a combination of regulatory, intellectual property, and patient privacy barriers. Synthetic clinical trial data that captu...

Full description

Saved in:
Bibliographic Details
Published inAMIA ... Annual Symposium proceedings Vol. 2022; p. 231
Main Authors Beigi, Mandis, Shafquat, Afrah, Mezey, Jason, Aptekar, Jacob
Format Journal Article
LanguageEnglish
Published United States 2022
Subjects
Online AccessGet full text
ISSN1942-597X
1559-4076

Cover

Loading…
Abstract Clinical trials capture high-quality data for millions of patients each year, yet these data are largely unavailable for research beyond the scope of any individual trial due to a combination of regulatory, intellectual property, and patient privacy barriers. Synthetic clinical trial data that captures the analytical properties of the source data, could provide significant value for research and drug development by making insights widely available while protecting the privacy of the participants. We present a method "Simulants" for generating research-grade synthetic clinical trial data from a real data source. We compared the fidelity and privacy preservation performance of Simulants to the state-of-the-art deep learning synthesizers and found that Simulants had superior performance when applied to clinical trial data as assessed both by established metrics and when considering critical clinical features. We also demonstrate how Simulants' privacy settings may be configured to conform to specific privacy policies governing data sharing.
AbstractList Clinical trials capture high-quality data for millions of patients each year, yet these data are largely unavailable for research beyond the scope of any individual trial due to a combination of regulatory, intellectual property, and patient privacy barriers. Synthetic clinical trial data that captures the analytical properties of the source data, could provide significant value for research and drug development by making insights widely available while protecting the privacy of the participants. We present a method "Simulants" for generating research-grade synthetic clinical trial data from a real data source. We compared the fidelity and privacy preservation performance of Simulants to the state-of-the-art deep learning synthesizers and found that Simulants had superior performance when applied to clinical trial data as assessed both by established metrics and when considering critical clinical features. We also demonstrate how Simulants' privacy settings may be configured to conform to specific privacy policies governing data sharing.
Clinical trials capture high-quality data for millions of patients each year, yet these data are largely unavailable for research beyond the scope of any individual trial due to a combination of regulatory, intellectual property, and patient privacy barriers. Synthetic clinical trial data that captures the analytical properties of the source data, could provide significant value for research and drug development by making insights widely available while protecting the privacy of the participants. We present a method "Simulants" for generating research-grade synthetic clinical trial data from a real data source. We compared the fidelity and privacy preservation performance of Simulants to the state-of-the-art deep learning synthesizers and found that Simulants had superior performance when applied to clinical trial data as assessed both by established metrics and when considering critical clinical features. We also demonstrate how Simulants' privacy settings may be configured to conform to specific privacy policies governing data sharing.Clinical trials capture high-quality data for millions of patients each year, yet these data are largely unavailable for research beyond the scope of any individual trial due to a combination of regulatory, intellectual property, and patient privacy barriers. Synthetic clinical trial data that captures the analytical properties of the source data, could provide significant value for research and drug development by making insights widely available while protecting the privacy of the participants. We present a method "Simulants" for generating research-grade synthetic clinical trial data from a real data source. We compared the fidelity and privacy preservation performance of Simulants to the state-of-the-art deep learning synthesizers and found that Simulants had superior performance when applied to clinical trial data as assessed both by established metrics and when considering critical clinical features. We also demonstrate how Simulants' privacy settings may be configured to conform to specific privacy policies governing data sharing.
Author Shafquat, Afrah
Beigi, Mandis
Mezey, Jason
Aptekar, Jacob
Author_xml – sequence: 1
  givenname: Mandis
  surname: Beigi
  fullname: Beigi, Mandis
  organization: Medidata, New York, NY, USA
– sequence: 2
  givenname: Afrah
  surname: Shafquat
  fullname: Shafquat, Afrah
  organization: Medidata, New York, NY, USA
– sequence: 3
  givenname: Jason
  surname: Mezey
  fullname: Mezey, Jason
  organization: Cornell University, Ithaca, NY, USA
– sequence: 4
  givenname: Jacob
  surname: Aptekar
  fullname: Aptekar, Jacob
  organization: Medidata, New York, NY, USA
BackLink https://www.ncbi.nlm.nih.gov/pubmed/37128411$$D View this record in MEDLINE/PubMed
BookMark eNo1kEtLxDAcxIOsuA_9CtKjl0DTvL1JfULBhS7orWSTfzVL261JW9hvb8H1MjOHHwMza7Tojh1coBXhXGOWSrGYs2YZ5lp-LtE6xkOaMsmVuEJLKkmmGCEr9FH6dmxMN8T7pDx1wzcM3iZ54ztvTZPsgp_10QwmmbxJynF_ADvgAiZokm3wk7EnvA0QIUy--zpXRB-v0WVtmgg3Z9-g8vlpl7_i4v3lLX8ocM8FwYrVVlpljbHOCTBAwQnNU1krSB23ktaOCb63nDAquFDW1U7VmQZHqdN0g-7-Wvtw_BkhDlXro4VmHgTHMVaZStW8WCsyo7dndNy34Ko--NaEU_V_Bf0FxNJe4A
ContentType Journal Article
Copyright 2022 AMIA - All rights reserved.
Copyright_xml – notice: 2022 AMIA - All rights reserved.
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 1559-4076
ExternalDocumentID 37128411
Genre Journal Article
GroupedDBID 2WC
53G
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BAWUL
CGR
CUY
CVF
DIK
E3Z
ECM
EIF
GX1
HYE
M~E
NPM
OK1
RPM
WOQ
7X8
ID FETCH-LOGICAL-p561-84fc7c8caacdd6eae3ed69507f8e0d5c73fd465bc51436568cdfd8f29ed33d93
ISSN 1942-597X
IngestDate Thu Jul 10 23:57:55 EDT 2025
Sat Sep 28 08:13:21 EDT 2024
IsPeerReviewed true
IsScholarly true
Language English
License 2022 AMIA - All rights reserved.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p561-84fc7c8caacdd6eae3ed69507f8e0d5c73fd465bc51436568cdfd8f29ed33d93
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 37128411
PQID 2808586981
PQPubID 23479
ParticipantIDs proquest_miscellaneous_2808586981
pubmed_primary_37128411
PublicationCentury 2000
PublicationDate 2022-00-00
20220101
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – year: 2022
  text: 2022-00-00
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle AMIA ... Annual Symposium proceedings
PublicationTitleAlternate AMIA Annu Symp Proc
PublicationYear 2022
SSID ssj0047586
Score 2.252436
Snippet Clinical trials capture high-quality data for millions of patients each year, yet these data are largely unavailable for research beyond the scope of any...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 231
SubjectTerms Confidentiality
Data Accuracy
Humans
Information Dissemination - methods
Privacy
Title Simulants: Synthetic Clinical Trial Data via Subject-Level Privacy-Preserving Synthesis
URI https://www.ncbi.nlm.nih.gov/pubmed/37128411
https://www.proquest.com/docview/2808586981
Volume 2022
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnZ1Rb9MwEMetaQ8TLwgGjA2GjLS3KNFo4sThrdpAo6ITUorWt8qxzxBtzcqaTOo-PXdxknaCSbCXqHWiPPgnn_93uTszdgQavWQRCt9qOPYjE-e4pJKBn-PWbY2McqmoOHl8Hp99j0ZTMV0fgthUl1R5oO_-WlfyGKo4hlypSvY_yPYvxQH8jXzxioTx-k-Ms2JeX1EiC_n12apEMUf9V0-6asdJcyTHqaqUd1soMhIUdfG_UqKQ9-2muFV65VMSBhkM6szdvGJZLDcl63D8ZegFQeC1rfiz1Zwyveq5t9781rF2KH4UrgaoNEU_nP1U9letXIsDuxGDHsOdi5mP1HKdEDBcVHDpMr9HaLHzzdCEqy8OoDWkIkXf1J3t0lna_pnWWDr7vwFqMW9IhQntmq0Rvt8Nu7tF7QrQmlN197TP6YnQ-aEzqbqHHvYaGvUwecaetrKfDx3D52wLyl22M24TG16wix7lR96D5B1I3oDkBJIjSH4PJP8TJO9BvmTZ50-TkzO_PfPCX6CS9WVkdaKlVkobE4OCEEycoma3Eo6N0AkuoCgWuSadi1JcaoMryg5SMGFo0vAV2y6vS3jNuFYgUHqmQqFDrK3IB3FiFSTowuLfONpn77u5maFFoc9EqoTrejkbSJThMk7lh3225yZttnCtT2bdzB48eOcNe0KYXZTqLduubmo4RN1W5e8aVr8BzjRKrQ
linkProvider Geneva Foundation for Medical Education and Research
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Simulants%3A+Synthetic+Clinical+Trial+Data+via+Subject-Level+Privacy-Preserving+Synthesis&rft.jtitle=AMIA+...+Annual+Symposium+proceedings&rft.au=Beigi%2C+Mandis&rft.au=Shafquat%2C+Afrah&rft.au=Mezey%2C+Jason&rft.au=Aptekar%2C+Jacob&rft.date=2022&rft.eissn=1559-4076&rft.volume=2022&rft.spage=231&rft_id=info%3Apmid%2F37128411&rft.externalDocID=37128411
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1942-597X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1942-597X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1942-597X&client=summon