Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study

A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. T...

Full description

Saved in:

Bibliographic Details
Published in	JMIR medical informatics Vol. 10; no. 4; p. e35734
Main Authors	El Emam, Khaled, Mosquera, Lucy, Fang, Xi, El-Hussuna, Alaa
Format	Journal Article
Language	English
Published	Canada JMIR Publications 07.04.2022
Subjects	Cluster analysis Datasets Decision making Multimedia Original Paper Privacy Time series Validation studies Workloads model validation medical informatics prediction model synthetic data generation utility metric synthetic data generative models data privacy data utility binary prediction model logistic regression
Online Access	Get full text

Cover

Loading…

Abstract	A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.
AbstractList	A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods.BACKGROUNDA regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods.This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research.OBJECTIVEThis study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research.We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models.METHODSWe evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models.The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions.RESULTSThe utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions.This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.CONCLUSIONSThis study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods. A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods. BackgroundA regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. ObjectiveThis study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. MethodsWe evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. ResultsThe utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. ConclusionsThis study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods. Background: A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. Objective: This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. Methods: We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. Results: The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. Conclusions: This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.
Author	Fang, Xi El-Hussuna, Alaa El Emam, Khaled Mosquera, Lucy
AuthorAffiliation	4 Open Source Research Collaboration Aarlberg Denmark 3 Replica Analytics Ltd Ottawa, ON Canada 2 Children's Hospital of Eastern Ontario Research Institute Ottawa, ON Canada 1 School of Epidemiology and Public Health University of Ottawa Ottawa, ON Canada
AuthorAffiliation_xml	– name: 4 Open Source Research Collaboration Aarlberg Denmark – name: 2 Children's Hospital of Eastern Ontario Research Institute Ottawa, ON Canada – name: 3 Replica Analytics Ltd Ottawa, ON Canada – name: 1 School of Epidemiology and Public Health University of Ottawa Ottawa, ON Canada
Author_xml	– sequence: 1 givenname: Khaled orcidid: 0000-0003-3325-4149 surname: El Emam fullname: El Emam, Khaled – sequence: 2 givenname: Lucy orcidid: 0000-0002-5289-8372 surname: Mosquera fullname: Mosquera, Lucy – sequence: 3 givenname: Xi orcidid: 0000-0002-5571-7004 surname: Fang fullname: Fang, Xi – sequence: 4 givenname: Alaa orcidid: 0000-0002-0070-8362 surname: El-Hussuna fullname: El-Hussuna, Alaa
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/35389366$$D View this record in MEDLINE/PubMed
BookMark	eNpdkt9rFDEQgINUbHvevyALIgjlNL838UGQWttCxYdawaeQS2bvcuwlbTZbuP_e9K6VXl-SMPnyMZOZY3QQUwSEpgR_okTLz0y0jL9CR5RqMtNS84Nn50M0HYYVxphwIqVs36BDJpjSTMoj9PemhD6UTfMTSg5uaLqUm7N724-2hLhorjexLKEE11yA7cuy-W6Lbc4hQq5Aig_vlskPX5o_tg9-F7suo9-8Ra872w8wfdwn6ObH2e_Ti9nVr_PL029XM8dFW2beUtphxzsuMLYad95LDJhb7ru5BEWwaJ2cAxEKEyfajs4F9Qo8p1K1nLMJutx5fbIrc5vD2uaNSTaYbSDlhbG5FtCDsVR7TgQVLWccQFkqtJor55h3DIBW19ed63acr8E7iCXbfk-6fxPD0izSvdGYYV2XCfr4KMjpboShmHUYHPS9jZDGwVDJFZZUc1HR9y_QVRpzrF9VKdEKIjFVlXr3PKP_qTx1sAInO8DlNAwZOuNC2bahJhh6Q7B5GBGzHZFKf3hBPwn3uX_GX7iE
CitedBy_id	crossref_primary_10_1016_j_compenvurbsys_2024_102242 crossref_primary_10_1051_medsci_2024091 crossref_primary_10_1200_CCI_23_00071 crossref_primary_10_1093_jamia_ocac131 crossref_primary_10_1038_s41598_024_57207_7 crossref_primary_10_3389_frai_2025_1533508 crossref_primary_10_1038_s41598_024_69812_7 crossref_primary_10_1016_j_tips_2023_06_010 crossref_primary_10_1038_s41598_024_51268_4 crossref_primary_10_1145_3636424 crossref_primary_10_1093_jamiaopen_ooac083 crossref_primary_10_1016_j_ijmedinf_2024_105413 crossref_primary_10_1016_j_atech_2023_100361 crossref_primary_10_1109_ACCESS_2024_3366556 crossref_primary_10_1016_j_isci_2022_105331 crossref_primary_10_1109_OJEMB_2024_3426910 crossref_primary_10_1145_3704437 crossref_primary_10_1007_s10618_024_01081_4 crossref_primary_10_1200_CCI_23_00021 crossref_primary_10_1186_s12874_023_01869_w crossref_primary_10_1109_JBHI_2023_3236722 crossref_primary_10_1016_j_compbiomed_2024_108734 crossref_primary_10_1186_s12911_024_02731_9 crossref_primary_10_1002_pds_70019 crossref_primary_10_3390_electronics11203277 crossref_primary_10_1093_jamiaopen_ooae114 crossref_primary_10_2196_66821 crossref_primary_10_1109_ACCESS_2025_3532128 crossref_primary_10_1002_cpt_3001 crossref_primary_10_20948_prepr_2024_53
Cites_doi	10.2478/popets-2019-0067 10.1109/cbms.2019.00036 10.46300/9101 10.3233/sji-150153 10.1007/978-3-319-11257-2_15 10.1080/09332480.2004.10554907 10.1016/j.csda.2011.06.006 10.1023/A:1010920819831 10.29012/jpc.v7i3.407 10.2196/16492 10.1186/s12874-020-00977-1 10.14778/3231751.3231757 10.1093/jamiaopen/ooaa060 10.1145/1143844.1143874 10.1111/j.1467-985x.2004.00343.x 10.1037/pspp0000208 10.1093/biomet/70.1.41 10.1093/jamia/ocz161 10.1007/978-3-319-99771-1_9 10.1136/bmjopen-2020-043497 10.2196/18910 10.1007/978-1-4612-1166-2 10.1287/mnsc.6.4.366 10.1007/springerreference_64338 10.1111/rssa.12358 10.1145/3085504.3091117 10.1093/jamiaopen/ooab012 10.1198/000313006x124640 10.1007/978-3-319-99771-1_5 10.1080/19345747.2019.1631421 10.1093/jamia/ocaa249 10.3390/app11052158 10.1038/s41746-020-00353-9 10.1161/CIRCOUTCOMES.118.005122 10.3233/sji-160959 10.1109/SmartGridComm.2018.8587464 10.1016/j.jclinepi.2019.02.004 10.2196/23139 10.29012/jpc.v1i1.568 10.3929/ethz-b-000392473 10.1093/bioinformatics/btm158 10.1145/3372297.3417238
ContentType	Journal Article
Copyright	Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. 2022
Copyright_xml	– notice: Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. – notice: 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. 2022
DBID	AAYXX CITATION NPM 3V. 7X7 7XB 88C 8FI 8FJ 8FK ABUWG AFKRA AZQEC BENPR CCPQU COVID DWQXO FYUFA GHDGH K9. M0S M0T PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQQKQ PQUKI PRINS 7X8 5PM DOA
DOI	10.2196/35734
DatabaseName	CrossRef PubMed ProQuest Central (Corporate) Health & Medical Collection ProQuest Central (purchase pre-March 2016) Healthcare Administration Database (Alumni) Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central ProQuest One Community College Coronavirus Research Database ProQuest Central Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Health & Medical Complete (Alumni) Health & Medical Collection (Alumni) Healthcare Administration Database ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef PubMed Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Central China ProQuest Central Health Research Premium Collection Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Health & Medical Research Collection ProQuest Central (New) ProQuest One Academic Eastern Edition ProQuest Health Management Coronavirus Research Database ProQuest Hospital Collection Health Research Premium Collection (Alumni) ProQuest Hospital Collection (Alumni) ProQuest Health & Medical Complete ProQuest One Academic UKI Edition ProQuest Health Management (Alumni Edition) ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic PubMed Publicly Available Content Database
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Medicine
EISSN	2291-9694
ExternalDocumentID	oai_doaj_org_article_a29d415257434ee8a2598b8cc3dc3ee2 PMC9030990 35389366 10_2196_35734
Genre	Journal Article
GroupedDBID	53G 5VS 7X7 8FI 8FJ AAFWJ AAYXX ABUWG ADBBV AFKRA AFPKN ALIPV ALMA_UNASSIGNED_HOLDINGS AOIJS BAWUL BCNDV BENPR CCPQU CITATION DIK EMOBN FYUFA GROUPED_DOAJ HMCUK HYE KQ8 M0T M48 M~E OK1 PGMZT PHGZM PHGZT PIMPY RPM UKHRP NPM 3V. 7XB 8FK AZQEC COVID DWQXO K9. PJZUB PKEHL PPXIY PQEST PQQKQ PQUKI PRINS 7X8 5PM PUEGO
ID	FETCH-LOGICAL-c457t-da22f0c4f4500a90fdd60e04a4dfb6e81057c6be15801c57f2b52d8ed42687443
IEDL.DBID	M48
ISSN	2291-9694
IngestDate	Wed Aug 27 01:29:31 EDT 2025 Thu Aug 21 18:20:25 EDT 2025 Mon Jul 21 11:00:07 EDT 2025 Fri Jul 25 02:24:59 EDT 2025 Thu Jan 02 22:55:20 EST 2025 Tue Jul 01 01:41:59 EDT 2025 Thu Apr 24 23:06:33 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	4
Keywords	model validation medical informatics prediction model synthetic data generation utility metric synthetic data generative models data privacy data utility binary prediction model logistic regression
Language	English
License	Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c457t-da22f0c4f4500a90fdd60e04a4dfb6e81057c6be15801c57f2b52d8ed42687443
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0002-5571-7004 0000-0002-0070-8362 0000-0002-5289-8372 0000-0003-3325-4149
OpenAccessLink	http://journals.scholarsportal.info/openUrl.xqy?doi=10.2196/35734
PMID	35389366
PQID	2657516028
PQPubID	4997117
ParticipantIDs	doaj_primary_oai_doaj_org_article_a29d415257434ee8a2598b8cc3dc3ee2 pubmedcentral_primary_oai_pubmedcentral_nih_gov_9030990 proquest_miscellaneous_2648062945 proquest_journals_2657516028 pubmed_primary_35389366 crossref_citationtrail_10_2196_35734 crossref_primary_10_2196_35734
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20220407
PublicationDateYYYYMMDD	2022-04-07
PublicationDate_xml	– month: 4 year: 2022 text: 20220407 day: 7
PublicationDecade	2020
PublicationPlace	Canada
PublicationPlace_xml	– name: Canada – name: Toronto – name: Toronto, Canada
PublicationTitle	JMIR medical informatics
PublicationTitleAlternate	JMIR Med Inform
PublicationYear	2022
Publisher	JMIR Publications
Publisher_xml	– name: JMIR Publications
References	ref13 ref57 ref12 ref56 ref15 Gomatam, S (ref31) 2005; 21 ref59 ref14 ref58 ref52 ref11 ref55 ref54 ref17 ref16 ref19 ref18 Hu, J (ref5) 2014 Joe, H (ref33) 2015 ref51 ref50 ref46 ref48 ref47 ref42 ref41 ref44 ref43 Le Cam, L (ref30) 2000 ref49 ref8 ref9 Sabay, A (ref53) 2018; 1 ref3 ref6 ref40 El Emam, K (ref10) 2020 ref35 ref34 ref37 ref36 Pepe, MS (ref45) 2004 ref32 ref2 ref1 ref39 ref38 Siegel, S (ref62) 1988 ref24 Ruiz, N (ref7) 2018 ref23 ref26 ref25 ref20 ref64 ref63 ref22 ref21 ref65 Taub, J (ref4) 2018 ref28 ref27 ref29 ref60 ref61
References_xml	– ident: ref37 – ident: ref65 doi: 10.2478/popets-2019-0067 – ident: ref59 doi: 10.1109/cbms.2019.00036 – ident: ref20 – ident: ref43 – ident: ref24 doi: 10.46300/9101 – year: 2004 ident: ref45 publication-title: The Statistical Evaluation of Medical Tests for Classification and Prediction – ident: ref57 doi: 10.3233/sji-150153 – ident: ref27 – volume: 21 start-page: 635 issue: 4 year: 2005 ident: ref31 publication-title: J Off Stat – start-page: 185 year: 2014 ident: ref5 publication-title: Privacy in Statistical Databases doi: 10.1007/978-3-319-11257-2_15 – ident: ref1 doi: 10.1080/09332480.2004.10554907 – ident: ref34 – ident: ref50 doi: 10.1016/j.csda.2011.06.006 – ident: ref47 doi: 10.1023/A:1010920819831 – ident: ref36 – ident: ref56 doi: 10.29012/jpc.v7i3.407 – ident: ref14 doi: 10.2196/16492 – ident: ref17 doi: 10.1186/s12874-020-00977-1 – ident: ref2 doi: 10.14778/3231751.3231757 – ident: ref16 doi: 10.1093/jamiaopen/ooaa060 – ident: ref46 doi: 10.1145/1143844.1143874 – ident: ref8 doi: 10.1111/j.1467-985x.2004.00343.x – ident: ref51 doi: 10.1037/pspp0000208 – ident: ref40 doi: 10.1093/biomet/70.1.41 – year: 1988 ident: ref62 publication-title: Nonparametric statistics for the behavioral sciences, 2nd ed – year: 2015 ident: ref33 publication-title: Dependence Modeling with Copulas – ident: ref61 doi: 10.1093/jamia/ocz161 – start-page: 122 year: 2018 ident: ref4 publication-title: Privacy in Statistical Databases doi: 10.1007/978-3-319-99771-1_9 – ident: ref54 – ident: ref12 doi: 10.1136/bmjopen-2020-043497 – ident: ref15 doi: 10.2196/18910 – year: 2000 ident: ref30 publication-title: Asymptotics in Statistics: Some Basic Concepts doi: 10.1007/978-1-4612-1166-2 – ident: ref35 doi: 10.1287/mnsc.6.4.366 – ident: ref39 – ident: ref3 – ident: ref55 doi: 10.1007/springerreference_64338 – ident: ref58 – ident: ref22 doi: 10.1111/rssa.12358 – ident: ref49 doi: 10.1145/3085504.3091117 – volume: 1 start-page: 12 issue: 3 year: 2018 ident: ref53 publication-title: SMU Data Science Review – year: 2020 ident: ref10 publication-title: Practical Synthetic Data Generation – ident: ref25 – ident: ref13 doi: 10.1093/jamiaopen/ooab012 – ident: ref48 – ident: ref11 doi: 10.1198/000313006x124640 – start-page: 59 year: 2018 ident: ref7 publication-title: Privacy in Statistical Databases doi: 10.1007/978-3-319-99771-1_5 – ident: ref32 – ident: ref52 doi: 10.1080/19345747.2019.1631421 – ident: ref19 doi: 10.1093/jamia/ocaa249 – ident: ref23 doi: 10.3390/app11052158 – ident: ref26 doi: 10.1038/s41746-020-00353-9 – ident: ref38 – ident: ref28 – ident: ref41 doi: 10.1161/CIRCOUTCOMES.118.005122 – ident: ref42 – ident: ref6 doi: 10.3233/sji-160959 – ident: ref29 doi: 10.1109/SmartGridComm.2018.8587464 – ident: ref44 doi: 10.1016/j.jclinepi.2019.02.004 – ident: ref9 doi: 10.2196/23139 – ident: ref21 doi: 10.29012/jpc.v1i1.568 – ident: ref60 doi: 10.3929/ethz-b-000392473 – ident: ref63 doi: 10.1093/bioinformatics/btm158 – ident: ref64 doi: 10.1145/3372297.3417238 – ident: ref18
SSID	ssj0001416667
Score	2.4028225
Snippet	A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility... Background: A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple... BackgroundA regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple...
SourceID	doaj pubmedcentral proquest pubmed crossref
SourceType	Open Website Open Access Repository Aggregation Database Index Database Enrichment Source
StartPage	e35734
SubjectTerms	Cluster analysis Datasets Decision making Multimedia Original Paper Privacy Time series Validation studies Workloads
SummonAdditionalLinks	– databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1La9wwEB5KDqFQSpO-nCZBgVxNtHpaveVJCGwu7Zb0ZGQ9mkLxlqxz2H_fGdtZdkOhl14tyUgzkucbz-gbgGPpXZikLEulGlEqz2Xp7USXxqUkTBCuinTBeXprrmfq5k7frZX6opywgR54ENyJFy6SkdFo6lRKlUe8XjVVCDIGie-jry_avDVnqv-7oigcZrfhFeU64y47kdpKtWF8eo7-vwHL5_mRawbn6g28HpEiOx1muAMvUrsL29MxFv4Wvs86ymtdsikVxQoLhvCTXY7k3e0P9mXZIrjDwWy4asQufOfZQDNN2qBx9_O4-My-IRYfSisxSitcvoPZ1eXX8-tyLJRQBqVtV0YvROZBZaU5947nGA1PXHkVc2NSRbV8g2nSRKM9Ctpm0WgRqxTRPBP9vXwPW-28TR-BWR-zsTK7mIzy6IshImzQyVGVkllGU8DxkwTrMLKIUzGLXzV6EyTouhd0AYerbr8H2oznHc5I_KtGYrnuH6Du61H39b90X8D-k_Lq8egtatGHkgzipgKOVs14aCgS4ts0f6Q-quJGOKUL-DDoejUTqQnDGVyp3dgFG1PdbGl_3vfE3I7iVY7v_Y-1fYKXgm5aUJKQ3Yet7uExHSD-6ZrDfqv_AR-_BJI priority: 102 providerName: Directory of Open Access Journals – databaseName: Health & Medical Collection dbid: 7X7 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3daxQxEB-0QhGKtH6utiVCX5fm8rnxRbS2FOF80ZPzacnmoy3Ibu1tH-6_b2Y3d_WK-LpJIJuZZH6TmfwG4Ihb4yYh8lKIhpXCUl5aPZGlMiEw5ZipPD5wnn5T5zPxdS7n-cJtkdMqV2ficFD7zuEd-TEbIgQqmcOP139KrBqF0dVcQuMxPEHqMtRqPdf3dywCg2J6G3Yw4znp2jGXmosNEzQw9f8LXj7MkvzL7JztwrOMF8mnUcB78Ci0z2F7miPiL-DXrMfs1iWZYmkstyAJhJLTTOHdXpDvyzZBvDSYjA-OyBfbWzKSTaNMcNxl5xcfyM-EyMcCSwSTC5cvYXZ2-uPkvMzlEkonpO5LbxmL1IkoJKXW0Oi9ooEKK3xsVKiwoq9TTZjIZJWc1JE1kvkq-GSkkQSfv4KttmvDGyDa-qg0j8YHJWzyyBIubJKrIyrBI_eqgKPVCtYuc4ljSYvfdfIpcKHrYaELOFx3ux7JMx52-IzLv25EruvhQ3dzUeetU1tmPMIMmcCOCKGyyWOrmso57h1PGlXA_kp4dd6Ai_peXQp4v25OWwfjIbYN3S32ERVVzAhZwOtR1uuZcIlITqU_1RtasDHVzZb26nKg5zYYtTL07f-n9Q6eMnxJgUlAeh-2-pvbcJDwTd8cDkp8Bznn--U priority: 102 providerName: ProQuest
Title	Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study
URI	https://www.ncbi.nlm.nih.gov/pubmed/35389366 https://www.proquest.com/docview/2657516028 https://www.proquest.com/docview/2648062945 https://pubmed.ncbi.nlm.nih.gov/PMC9030990 https://doaj.org/article/a29d415257434ee8a2598b8cc3dc3ee2
Volume	10
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1ta9RAEB5sC0Uo4muN1rBCv0Zz-5oVRKxeKcIVUU_qp7DZ3bRCydW7FLx_70ySO3ql4NfsLuzOzDLPZGafATgUzvpRrEUmZcUz6XKROTNSmbYxcu25LQI9cJ6c6pOp_HKmblQTDgJc3BnaUT-p6fzyzd8_yw944d9TGTMa0FuhjJBbsIPOyFATg8mA8LvfLJLyYvRomnM7yqy2chf2NlZueKSOuP8utHm7aPKGFzp-CA8G-Mg-9vp-BPdi8xh2J0OC_An8mrZU7LpkE-qU5RcMMSkbD4zezTn7vmwQ8eFi1r8_Yp9d61jPPU0qonUXs7B4x34iQO_7LTGqNVw-henx-Menk2zonpB5qUybBcd5nXtZS5XnzuZ1CDqPuXQy1JWOBTX49bqKI4VOyitT80rxUMSAPps48cUz2G5mTXwOzLhQayNqG6KWDgM0hIkVRj6ykKIWQSdwuJJg6QdqcepwcVliiEGCLjtBJ5Cup131XBq3JxyR-NeDRH3dfZjNz8vhJpWO20CoQyH2kTEWDgO4oiq8F8ELNLAEDlbKK1fmVPIuv6QRTCXwej2MN4nSI66Js2uaI4tccytVAvu9rtc7EYqAncaTmg0r2Njq5kjz-6Jj67aUxLL5i_8d_iXc5_S0gqqCzAFst_Pr-AoBT1ulsGXOTAo7R-PTr9_S7rdB2pn5PwyoAhQ
linkProvider	Scholars Portal
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3dT9RAEJ8gJmhijN-eIq4JPjb09rM1MUYFcgjHi5w5n-p2dwskpEWuxNw_5d_oTD8OjxjfeO1um-3szM5vdr4ANoVN3TAUIpIy55G0sYisGapIpyFw7XiaeEpwHh_q0UR-marpCvzuc2EorLI_E5uD2leO7si3eOMh0KgOP5z_jKhrFHlX-xYaLVvsh_kvNNlm7_e2cX_fcr67c_R5FHVdBSInlakjbzkvYicLqeLYpnHhvY5DLK30Ra5DQo1vnc7DUOHh7ZQpeK64T4JHXUa14gV-9xbcRsUbk7FnpubqTkeSE86swT2KsEbe3hLKCLmk8prOAP-Cs9ejMv9Sc7sP4H6HT9nHlqEewkooH8HauPPAP4bvk5qiaedsTK243Iwh6GU7Xcnw8ph9nZcIKfFl1iY4sW1bW9YWtyYeoPdOKj97x76hBdA2dGIUzDh_ApMbIeRTWC2rMjwHZqwvtBFF6oOWFi1AxKE5mlYykaIQXg9gs6dg5rra5dRC4yxDG4YInTWEHsDGYtp5W6zj-oRPRP7FINXWbh5UF8dZJ6qZ5aknWKMQXMkQEosWYpInzgnvBHLwANb7zcs6gZ9lV-w5gDeLYRRV8r_YMlSXNEcmseapVAN41u71YiVCEXLU-KdmiQuWlro8Up6eNOXAU_KSpfGL_y_rNdwZHY0PsoO9w_2XcJdTFgcFIJl1WK0vLsMrxFZ1vtEwNIMfNy1BfwBygjh5
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3rb9MwED-NTqqQEOJNYRtGGh-jpn4lQUJoo602RqsJKBqfgmM7GxJKxtoJ9V_jr-Muj45OiG_7GjuRc77z_c73AtgVJrEDn4tAyowH0oQiMNFABTrxnmvLk9hRgvNkqg9m8v2JOtmA320uDIVVtmdidVC70tIdeZ9XHgKN6rCfN2ERx8Px2_OfAXWQIk9r206jZpEjv_yF5tv8zeEQ9_oV5-PR53cHQdNhILBSRYvAGc7z0MpcqjA0SZg7p0MfSiNdnmkfUxNcqzM_UHiQWxXlPFPcxd6hXqO68QK_ews2I7KKOrC5P5oef7y64ZHkkou6cIfirZHT-0JFQq4pwKpPwL_A7fUYzb-U3vge3G3QKtur2es-bPjiAXQnjT_-IXydLSi2dskm1JjLzhlCYDZqCogXp-zTskCAiS-zOt2JDc3CsLrUNXEEvXdWuvlr9gXtgbq9E6PQxuUjmN0IKR9DpygL_xRYZFyuI5Enzmtp0B5EVJqhoSVjKXLhdA92WwqmtqlkTg01fqRo0RCh04rQPdhZTTuvS3dcn7BP5F8NUqXt6kF5cZo2gpsanjgCOQqhlvQ-NmgvxllsrXBWID_3YKvdvLQR_3l6xaw9eLkaRsElb4wpfHlJc2Qcap5I1YMn9V6vViIU4UiNfxqtccHaUtdHiu9nVXHwhHxmSfjs_8t6AV2UnvTD4fToOdzmlNJB0UjRFnQWF5d-G4HWIttpOJrBt5sWoj8n0z4U
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Utility+Metrics+for+Evaluating+Synthetic+Health+Data+Generation+Methods%3A+Validation+Study&rft.jtitle=JMIR+medical+informatics&rft.au=El+Emam%2C+Khaled&rft.au=Mosquera%2C+Lucy&rft.au=Fang%2C+Xi&rft.au=El-Hussuna%2C+Alaa&rft.date=2022-04-07&rft.issn=2291-9694&rft.eissn=2291-9694&rft.volume=10&rft.issue=4&rft.spage=e35734&rft_id=info:doi/10.2196%2F35734&rft.externalDBID=n%2Fa&rft.externalDocID=10_2196_35734
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2291-9694&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2291-9694&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2291-9694&client=summon