Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study

A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. T...

Full description

Saved in:
Bibliographic Details
Published inJMIR medical informatics Vol. 10; no. 4; p. e35734
Main Authors El Emam, Khaled, Mosquera, Lucy, Fang, Xi, El-Hussuna, Alaa
Format Journal Article
LanguageEnglish
Published Canada JMIR Publications 07.04.2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.
AbstractList A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods.BACKGROUNDA regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods.This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research.OBJECTIVEThis study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research.We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models.METHODSWe evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models.The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions.RESULTSThe utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions.This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.CONCLUSIONSThis study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.
A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.
BackgroundA regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. ObjectiveThis study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. MethodsWe evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. ResultsThe utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. ConclusionsThis study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.
Background: A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. Objective: This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. Methods: We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. Results: The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. Conclusions: This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.
Author Fang, Xi
El-Hussuna, Alaa
El Emam, Khaled
Mosquera, Lucy
AuthorAffiliation 4 Open Source Research Collaboration Aarlberg Denmark
3 Replica Analytics Ltd Ottawa, ON Canada
2 Children's Hospital of Eastern Ontario Research Institute Ottawa, ON Canada
1 School of Epidemiology and Public Health University of Ottawa Ottawa, ON Canada
AuthorAffiliation_xml – name: 4 Open Source Research Collaboration Aarlberg Denmark
– name: 2 Children's Hospital of Eastern Ontario Research Institute Ottawa, ON Canada
– name: 3 Replica Analytics Ltd Ottawa, ON Canada
– name: 1 School of Epidemiology and Public Health University of Ottawa Ottawa, ON Canada
Author_xml – sequence: 1
  givenname: Khaled
  orcidid: 0000-0003-3325-4149
  surname: El Emam
  fullname: El Emam, Khaled
– sequence: 2
  givenname: Lucy
  orcidid: 0000-0002-5289-8372
  surname: Mosquera
  fullname: Mosquera, Lucy
– sequence: 3
  givenname: Xi
  orcidid: 0000-0002-5571-7004
  surname: Fang
  fullname: Fang, Xi
– sequence: 4
  givenname: Alaa
  orcidid: 0000-0002-0070-8362
  surname: El-Hussuna
  fullname: El-Hussuna, Alaa
BackLink https://www.ncbi.nlm.nih.gov/pubmed/35389366$$D View this record in MEDLINE/PubMed
BookMark eNpdkt9rFDEQgINUbHvevyALIgjlNL838UGQWttCxYdawaeQS2bvcuwlbTZbuP_e9K6VXl-SMPnyMZOZY3QQUwSEpgR_okTLz0y0jL9CR5RqMtNS84Nn50M0HYYVxphwIqVs36BDJpjSTMoj9PemhD6UTfMTSg5uaLqUm7N724-2hLhorjexLKEE11yA7cuy-W6Lbc4hQq5Aig_vlskPX5o_tg9-F7suo9-8Ra872w8wfdwn6ObH2e_Ti9nVr_PL029XM8dFW2beUtphxzsuMLYad95LDJhb7ru5BEWwaJ2cAxEKEyfajs4F9Qo8p1K1nLMJutx5fbIrc5vD2uaNSTaYbSDlhbG5FtCDsVR7TgQVLWccQFkqtJor55h3DIBW19ed63acr8E7iCXbfk-6fxPD0izSvdGYYV2XCfr4KMjpboShmHUYHPS9jZDGwVDJFZZUc1HR9y_QVRpzrF9VKdEKIjFVlXr3PKP_qTx1sAInO8DlNAwZOuNC2bahJhh6Q7B5GBGzHZFKf3hBPwn3uX_GX7iE
CitedBy_id crossref_primary_10_1016_j_compenvurbsys_2024_102242
crossref_primary_10_1051_medsci_2024091
crossref_primary_10_1200_CCI_23_00071
crossref_primary_10_1093_jamia_ocac131
crossref_primary_10_1038_s41598_024_57207_7
crossref_primary_10_3389_frai_2025_1533508
crossref_primary_10_1038_s41598_024_69812_7
crossref_primary_10_1016_j_tips_2023_06_010
crossref_primary_10_1038_s41598_024_51268_4
crossref_primary_10_1145_3636424
crossref_primary_10_1093_jamiaopen_ooac083
crossref_primary_10_1016_j_ijmedinf_2024_105413
crossref_primary_10_1016_j_atech_2023_100361
crossref_primary_10_1109_ACCESS_2024_3366556
crossref_primary_10_1016_j_isci_2022_105331
crossref_primary_10_1109_OJEMB_2024_3426910
crossref_primary_10_1145_3704437
crossref_primary_10_1007_s10618_024_01081_4
crossref_primary_10_1200_CCI_23_00021
crossref_primary_10_1186_s12874_023_01869_w
crossref_primary_10_1109_JBHI_2023_3236722
crossref_primary_10_1016_j_compbiomed_2024_108734
crossref_primary_10_1186_s12911_024_02731_9
crossref_primary_10_1002_pds_70019
crossref_primary_10_3390_electronics11203277
crossref_primary_10_1093_jamiaopen_ooae114
crossref_primary_10_2196_66821
crossref_primary_10_1109_ACCESS_2025_3532128
crossref_primary_10_1002_cpt_3001
crossref_primary_10_20948_prepr_2024_53
Cites_doi 10.2478/popets-2019-0067
10.1109/cbms.2019.00036
10.46300/9101
10.3233/sji-150153
10.1007/978-3-319-11257-2_15
10.1080/09332480.2004.10554907
10.1016/j.csda.2011.06.006
10.1023/A:1010920819831
10.29012/jpc.v7i3.407
10.2196/16492
10.1186/s12874-020-00977-1
10.14778/3231751.3231757
10.1093/jamiaopen/ooaa060
10.1145/1143844.1143874
10.1111/j.1467-985x.2004.00343.x
10.1037/pspp0000208
10.1093/biomet/70.1.41
10.1093/jamia/ocz161
10.1007/978-3-319-99771-1_9
10.1136/bmjopen-2020-043497
10.2196/18910
10.1007/978-1-4612-1166-2
10.1287/mnsc.6.4.366
10.1007/springerreference_64338
10.1111/rssa.12358
10.1145/3085504.3091117
10.1093/jamiaopen/ooab012
10.1198/000313006x124640
10.1007/978-3-319-99771-1_5
10.1080/19345747.2019.1631421
10.1093/jamia/ocaa249
10.3390/app11052158
10.1038/s41746-020-00353-9
10.1161/CIRCOUTCOMES.118.005122
10.3233/sji-160959
10.1109/SmartGridComm.2018.8587464
10.1016/j.jclinepi.2019.02.004
10.2196/23139
10.29012/jpc.v1i1.568
10.3929/ethz-b-000392473
10.1093/bioinformatics/btm158
10.1145/3372297.3417238
ContentType Journal Article
Copyright Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022.
2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. 2022
Copyright_xml – notice: Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022.
– notice: 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. 2022
DBID AAYXX
CITATION
NPM
3V.
7X7
7XB
88C
8FI
8FJ
8FK
ABUWG
AFKRA
AZQEC
BENPR
CCPQU
COVID
DWQXO
FYUFA
GHDGH
K9.
M0S
M0T
PHGZM
PHGZT
PIMPY
PJZUB
PKEHL
PPXIY
PQEST
PQQKQ
PQUKI
PRINS
7X8
5PM
DOA
DOI 10.2196/35734
DatabaseName CrossRef
PubMed
ProQuest Central (Corporate)
Health & Medical Collection
ProQuest Central (purchase pre-March 2016)
Healthcare Administration Database (Alumni)
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
ProQuest One Community College
Coronavirus Research Database
ProQuest Central
Health Research Premium Collection
Health Research Premium Collection (Alumni)
ProQuest Health & Medical Complete (Alumni)
Health & Medical Collection (Alumni)
Healthcare Administration Database
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest Health & Medical Research Collection
ProQuest One Academic Middle East (New)
ProQuest One Health & Nursing
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
PubMed
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Health & Medical Complete (Alumni)
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Health & Nursing
ProQuest Central China
ProQuest Central
Health Research Premium Collection
Health and Medicine Complete (Alumni Edition)
ProQuest Central Korea
Health & Medical Research Collection
ProQuest Central (New)
ProQuest One Academic Eastern Edition
ProQuest Health Management
Coronavirus Research Database
ProQuest Hospital Collection
Health Research Premium Collection (Alumni)
ProQuest Hospital Collection (Alumni)
ProQuest Health & Medical Complete
ProQuest One Academic UKI Edition
ProQuest Health Management (Alumni Edition)
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
PubMed

Publicly Available Content Database
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 3
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
EISSN 2291-9694
ExternalDocumentID oai_doaj_org_article_a29d415257434ee8a2598b8cc3dc3ee2
PMC9030990
35389366
10_2196_35734
Genre Journal Article
GroupedDBID 53G
5VS
7X7
8FI
8FJ
AAFWJ
AAYXX
ABUWG
ADBBV
AFKRA
AFPKN
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BAWUL
BCNDV
BENPR
CCPQU
CITATION
DIK
EMOBN
FYUFA
GROUPED_DOAJ
HMCUK
HYE
KQ8
M0T
M48
M~E
OK1
PGMZT
PHGZM
PHGZT
PIMPY
RPM
UKHRP
NPM
3V.
7XB
8FK
AZQEC
COVID
DWQXO
K9.
PJZUB
PKEHL
PPXIY
PQEST
PQQKQ
PQUKI
PRINS
7X8
5PM
PUEGO
ID FETCH-LOGICAL-c457t-da22f0c4f4500a90fdd60e04a4dfb6e81057c6be15801c57f2b52d8ed42687443
IEDL.DBID M48
ISSN 2291-9694
IngestDate Wed Aug 27 01:29:31 EDT 2025
Thu Aug 21 18:20:25 EDT 2025
Mon Jul 21 11:00:07 EDT 2025
Fri Jul 25 02:24:59 EDT 2025
Thu Jan 02 22:55:20 EST 2025
Tue Jul 01 01:41:59 EDT 2025
Thu Apr 24 23:06:33 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords model validation
medical informatics
prediction model
synthetic data generation
utility metric
synthetic data
generative models
data privacy
data utility
binary prediction model
logistic regression
Language English
License Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c457t-da22f0c4f4500a90fdd60e04a4dfb6e81057c6be15801c57f2b52d8ed42687443
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-5571-7004
0000-0002-0070-8362
0000-0002-5289-8372
0000-0003-3325-4149
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.2196/35734
PMID 35389366
PQID 2657516028
PQPubID 4997117
ParticipantIDs doaj_primary_oai_doaj_org_article_a29d415257434ee8a2598b8cc3dc3ee2
pubmedcentral_primary_oai_pubmedcentral_nih_gov_9030990
proquest_miscellaneous_2648062945
proquest_journals_2657516028
pubmed_primary_35389366
crossref_citationtrail_10_2196_35734
crossref_primary_10_2196_35734
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20220407
PublicationDateYYYYMMDD 2022-04-07
PublicationDate_xml – month: 4
  year: 2022
  text: 20220407
  day: 7
PublicationDecade 2020
PublicationPlace Canada
PublicationPlace_xml – name: Canada
– name: Toronto
– name: Toronto, Canada
PublicationTitle JMIR medical informatics
PublicationTitleAlternate JMIR Med Inform
PublicationYear 2022
Publisher JMIR Publications
Publisher_xml – name: JMIR Publications
References ref13
ref57
ref12
ref56
ref15
Gomatam, S (ref31) 2005; 21
ref59
ref14
ref58
ref52
ref11
ref55
ref54
ref17
ref16
ref19
ref18
Hu, J (ref5) 2014
Joe, H (ref33) 2015
ref51
ref50
ref46
ref48
ref47
ref42
ref41
ref44
ref43
Le Cam, L (ref30) 2000
ref49
ref8
ref9
Sabay, A (ref53) 2018; 1
ref3
ref6
ref40
El Emam, K (ref10) 2020
ref35
ref34
ref37
ref36
Pepe, MS (ref45) 2004
ref32
ref2
ref1
ref39
ref38
Siegel, S (ref62) 1988
ref24
Ruiz, N (ref7) 2018
ref23
ref26
ref25
ref20
ref64
ref63
ref22
ref21
ref65
Taub, J (ref4) 2018
ref28
ref27
ref29
ref60
ref61
References_xml – ident: ref37
– ident: ref65
  doi: 10.2478/popets-2019-0067
– ident: ref59
  doi: 10.1109/cbms.2019.00036
– ident: ref20
– ident: ref43
– ident: ref24
  doi: 10.46300/9101
– year: 2004
  ident: ref45
  publication-title: The Statistical Evaluation of Medical Tests for Classification and Prediction
– ident: ref57
  doi: 10.3233/sji-150153
– ident: ref27
– volume: 21
  start-page: 635
  issue: 4
  year: 2005
  ident: ref31
  publication-title: J Off Stat
– start-page: 185
  year: 2014
  ident: ref5
  publication-title: Privacy in Statistical Databases
  doi: 10.1007/978-3-319-11257-2_15
– ident: ref1
  doi: 10.1080/09332480.2004.10554907
– ident: ref34
– ident: ref50
  doi: 10.1016/j.csda.2011.06.006
– ident: ref47
  doi: 10.1023/A:1010920819831
– ident: ref36
– ident: ref56
  doi: 10.29012/jpc.v7i3.407
– ident: ref14
  doi: 10.2196/16492
– ident: ref17
  doi: 10.1186/s12874-020-00977-1
– ident: ref2
  doi: 10.14778/3231751.3231757
– ident: ref16
  doi: 10.1093/jamiaopen/ooaa060
– ident: ref46
  doi: 10.1145/1143844.1143874
– ident: ref8
  doi: 10.1111/j.1467-985x.2004.00343.x
– ident: ref51
  doi: 10.1037/pspp0000208
– ident: ref40
  doi: 10.1093/biomet/70.1.41
– year: 1988
  ident: ref62
  publication-title: Nonparametric statistics for the behavioral sciences, 2nd ed
– year: 2015
  ident: ref33
  publication-title: Dependence Modeling with Copulas
– ident: ref61
  doi: 10.1093/jamia/ocz161
– start-page: 122
  year: 2018
  ident: ref4
  publication-title: Privacy in Statistical Databases
  doi: 10.1007/978-3-319-99771-1_9
– ident: ref54
– ident: ref12
  doi: 10.1136/bmjopen-2020-043497
– ident: ref15
  doi: 10.2196/18910
– year: 2000
  ident: ref30
  publication-title: Asymptotics in Statistics: Some Basic Concepts
  doi: 10.1007/978-1-4612-1166-2
– ident: ref35
  doi: 10.1287/mnsc.6.4.366
– ident: ref39
– ident: ref3
– ident: ref55
  doi: 10.1007/springerreference_64338
– ident: ref58
– ident: ref22
  doi: 10.1111/rssa.12358
– ident: ref49
  doi: 10.1145/3085504.3091117
– volume: 1
  start-page: 12
  issue: 3
  year: 2018
  ident: ref53
  publication-title: SMU Data Science Review
– year: 2020
  ident: ref10
  publication-title: Practical Synthetic Data Generation
– ident: ref25
– ident: ref13
  doi: 10.1093/jamiaopen/ooab012
– ident: ref48
– ident: ref11
  doi: 10.1198/000313006x124640
– start-page: 59
  year: 2018
  ident: ref7
  publication-title: Privacy in Statistical Databases
  doi: 10.1007/978-3-319-99771-1_5
– ident: ref32
– ident: ref52
  doi: 10.1080/19345747.2019.1631421
– ident: ref19
  doi: 10.1093/jamia/ocaa249
– ident: ref23
  doi: 10.3390/app11052158
– ident: ref26
  doi: 10.1038/s41746-020-00353-9
– ident: ref38
– ident: ref28
– ident: ref41
  doi: 10.1161/CIRCOUTCOMES.118.005122
– ident: ref42
– ident: ref6
  doi: 10.3233/sji-160959
– ident: ref29
  doi: 10.1109/SmartGridComm.2018.8587464
– ident: ref44
  doi: 10.1016/j.jclinepi.2019.02.004
– ident: ref9
  doi: 10.2196/23139
– ident: ref21
  doi: 10.29012/jpc.v1i1.568
– ident: ref60
  doi: 10.3929/ethz-b-000392473
– ident: ref63
  doi: 10.1093/bioinformatics/btm158
– ident: ref64
  doi: 10.1145/3372297.3417238
– ident: ref18
SSID ssj0001416667
Score 2.4028225
Snippet A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility...
Background: A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple...
BackgroundA regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple...
SourceID doaj
pubmedcentral
proquest
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e35734
SubjectTerms Cluster analysis
Datasets
Decision making
Multimedia
Original Paper
Privacy
Time series
Validation studies
Workloads
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1La9wwEB5KDqFQSpO-nCZBgVxNtHpaveVJCGwu7Zb0ZGQ9mkLxlqxz2H_fGdtZdkOhl14tyUgzkucbz-gbgGPpXZikLEulGlEqz2Xp7USXxqUkTBCuinTBeXprrmfq5k7frZX6opywgR54ENyJFy6SkdFo6lRKlUe8XjVVCDIGie-jry_avDVnqv-7oigcZrfhFeU64y47kdpKtWF8eo7-vwHL5_mRawbn6g28HpEiOx1muAMvUrsL29MxFv4Wvs86ymtdsikVxQoLhvCTXY7k3e0P9mXZIrjDwWy4asQufOfZQDNN2qBx9_O4-My-IRYfSisxSitcvoPZ1eXX8-tyLJRQBqVtV0YvROZBZaU5947nGA1PXHkVc2NSRbV8g2nSRKM9Ctpm0WgRqxTRPBP9vXwPW-28TR-BWR-zsTK7mIzy6IshImzQyVGVkllGU8DxkwTrMLKIUzGLXzV6EyTouhd0AYerbr8H2oznHc5I_KtGYrnuH6Du61H39b90X8D-k_Lq8egtatGHkgzipgKOVs14aCgS4ts0f6Q-quJGOKUL-DDoejUTqQnDGVyp3dgFG1PdbGl_3vfE3I7iVY7v_Y-1fYKXgm5aUJKQ3Yet7uExHSD-6ZrDfqv_AR-_BJI
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: Health & Medical Collection
  dbid: 7X7
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3daxQxEB-0QhGKtH6utiVCX5fm8rnxRbS2FOF80ZPzacnmoy3Ibu1tH-6_b2Y3d_WK-LpJIJuZZH6TmfwG4Ihb4yYh8lKIhpXCUl5aPZGlMiEw5ZipPD5wnn5T5zPxdS7n-cJtkdMqV2ficFD7zuEd-TEbIgQqmcOP139KrBqF0dVcQuMxPEHqMtRqPdf3dywCg2J6G3Yw4znp2jGXmosNEzQw9f8LXj7MkvzL7JztwrOMF8mnUcB78Ci0z2F7miPiL-DXrMfs1iWZYmkstyAJhJLTTOHdXpDvyzZBvDSYjA-OyBfbWzKSTaNMcNxl5xcfyM-EyMcCSwSTC5cvYXZ2-uPkvMzlEkonpO5LbxmL1IkoJKXW0Oi9ooEKK3xsVKiwoq9TTZjIZJWc1JE1kvkq-GSkkQSfv4KttmvDGyDa-qg0j8YHJWzyyBIubJKrIyrBI_eqgKPVCtYuc4ljSYvfdfIpcKHrYaELOFx3ux7JMx52-IzLv25EruvhQ3dzUeetU1tmPMIMmcCOCKGyyWOrmso57h1PGlXA_kp4dd6Ai_peXQp4v25OWwfjIbYN3S32ERVVzAhZwOtR1uuZcIlITqU_1RtasDHVzZb26nKg5zYYtTL07f-n9Q6eMnxJgUlAeh-2-pvbcJDwTd8cDkp8Bznn--U
  priority: 102
  providerName: ProQuest
Title Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study
URI https://www.ncbi.nlm.nih.gov/pubmed/35389366
https://www.proquest.com/docview/2657516028
https://www.proquest.com/docview/2648062945
https://pubmed.ncbi.nlm.nih.gov/PMC9030990
https://doaj.org/article/a29d415257434ee8a2598b8cc3dc3ee2
Volume 10
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1ta9RAEB5sC0Uo4muN1rBCv0Zz-5oVRKxeKcIVUU_qp7DZ3bRCydW7FLx_70ySO3ql4NfsLuzOzDLPZGafATgUzvpRrEUmZcUz6XKROTNSmbYxcu25LQI9cJ6c6pOp_HKmblQTDgJc3BnaUT-p6fzyzd8_yw944d9TGTMa0FuhjJBbsIPOyFATg8mA8LvfLJLyYvRomnM7yqy2chf2NlZueKSOuP8utHm7aPKGFzp-CA8G-Mg-9vp-BPdi8xh2J0OC_An8mrZU7LpkE-qU5RcMMSkbD4zezTn7vmwQ8eFi1r8_Yp9d61jPPU0qonUXs7B4x34iQO_7LTGqNVw-henx-Menk2zonpB5qUybBcd5nXtZS5XnzuZ1CDqPuXQy1JWOBTX49bqKI4VOyitT80rxUMSAPps48cUz2G5mTXwOzLhQayNqG6KWDgM0hIkVRj6ykKIWQSdwuJJg6QdqcepwcVliiEGCLjtBJ5Cup131XBq3JxyR-NeDRH3dfZjNz8vhJpWO20CoQyH2kTEWDgO4oiq8F8ELNLAEDlbKK1fmVPIuv6QRTCXwej2MN4nSI66Js2uaI4tccytVAvu9rtc7EYqAncaTmg0r2Njq5kjz-6Jj67aUxLL5i_8d_iXc5_S0gqqCzAFst_Pr-AoBT1ulsGXOTAo7R-PTr9_S7rdB2pn5PwyoAhQ
linkProvider Scholars Portal
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3dT9RAEJ8gJmhijN-eIq4JPjb09rM1MUYFcgjHi5w5n-p2dwskpEWuxNw_5d_oTD8OjxjfeO1um-3szM5vdr4ANoVN3TAUIpIy55G0sYisGapIpyFw7XiaeEpwHh_q0UR-marpCvzuc2EorLI_E5uD2leO7si3eOMh0KgOP5z_jKhrFHlX-xYaLVvsh_kvNNlm7_e2cX_fcr67c_R5FHVdBSInlakjbzkvYicLqeLYpnHhvY5DLK30Ra5DQo1vnc7DUOHh7ZQpeK64T4JHXUa14gV-9xbcRsUbk7FnpubqTkeSE86swT2KsEbe3hLKCLmk8prOAP-Cs9ejMv9Sc7sP4H6HT9nHlqEewkooH8HauPPAP4bvk5qiaedsTK243Iwh6GU7Xcnw8ph9nZcIKfFl1iY4sW1bW9YWtyYeoPdOKj97x76hBdA2dGIUzDh_ApMbIeRTWC2rMjwHZqwvtBFF6oOWFi1AxKE5mlYykaIQXg9gs6dg5rra5dRC4yxDG4YInTWEHsDGYtp5W6zj-oRPRP7FINXWbh5UF8dZJ6qZ5aknWKMQXMkQEosWYpInzgnvBHLwANb7zcs6gZ9lV-w5gDeLYRRV8r_YMlSXNEcmseapVAN41u71YiVCEXLU-KdmiQuWlro8Up6eNOXAU_KSpfGL_y_rNdwZHY0PsoO9w_2XcJdTFgcFIJl1WK0vLsMrxFZ1vtEwNIMfNy1BfwBygjh5
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3rb9MwED-NTqqQEOJNYRtGGh-jpn4lQUJoo602RqsJKBqfgmM7GxJKxtoJ9V_jr-Muj45OiG_7GjuRc77z_c73AtgVJrEDn4tAyowH0oQiMNFABTrxnmvLk9hRgvNkqg9m8v2JOtmA320uDIVVtmdidVC70tIdeZ9XHgKN6rCfN2ERx8Px2_OfAXWQIk9r206jZpEjv_yF5tv8zeEQ9_oV5-PR53cHQdNhILBSRYvAGc7z0MpcqjA0SZg7p0MfSiNdnmkfUxNcqzM_UHiQWxXlPFPcxd6hXqO68QK_ews2I7KKOrC5P5oef7y64ZHkkou6cIfirZHT-0JFQq4pwKpPwL_A7fUYzb-U3vge3G3QKtur2es-bPjiAXQnjT_-IXydLSi2dskm1JjLzhlCYDZqCogXp-zTskCAiS-zOt2JDc3CsLrUNXEEvXdWuvlr9gXtgbq9E6PQxuUjmN0IKR9DpygL_xRYZFyuI5Enzmtp0B5EVJqhoSVjKXLhdA92WwqmtqlkTg01fqRo0RCh04rQPdhZTTuvS3dcn7BP5F8NUqXt6kF5cZo2gpsanjgCOQqhlvQ-NmgvxllsrXBWID_3YKvdvLQR_3l6xaw9eLkaRsElb4wpfHlJc2Qcap5I1YMn9V6vViIU4UiNfxqtccHaUtdHiu9nVXHwhHxmSfjs_8t6AV2UnvTD4fToOdzmlNJB0UjRFnQWF5d-G4HWIttpOJrBt5sWoj8n0z4U
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Utility+Metrics+for+Evaluating+Synthetic+Health+Data+Generation+Methods%3A+Validation+Study&rft.jtitle=JMIR+medical+informatics&rft.au=El+Emam%2C+Khaled&rft.au=Mosquera%2C+Lucy&rft.au=Fang%2C+Xi&rft.au=El-Hussuna%2C+Alaa&rft.date=2022-04-07&rft.issn=2291-9694&rft.eissn=2291-9694&rft.volume=10&rft.issue=4&rft.spage=e35734&rft_id=info:doi/10.2196%2F35734&rft.externalDBID=n%2Fa&rft.externalDocID=10_2196_35734
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2291-9694&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2291-9694&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2291-9694&client=summon