Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study
A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. T...
Saved in:
Published in | JMIR medical informatics Vol. 10; no. 4; p. e35734 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Canada
JMIR Publications
07.04.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods.
This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research.
We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models.
The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions.
This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods. |
---|---|
AbstractList | A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods.BACKGROUNDA regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods.This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research.OBJECTIVEThis study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research.We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models.METHODSWe evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models.The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions.RESULTSThe utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions.This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.CONCLUSIONSThis study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods. A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods. BackgroundA regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. ObjectiveThis study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. MethodsWe evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. ResultsThe utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. ConclusionsThis study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods. Background: A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. Objective: This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. Methods: We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. Results: The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. Conclusions: This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods. |
Author | Fang, Xi El-Hussuna, Alaa El Emam, Khaled Mosquera, Lucy |
AuthorAffiliation | 4 Open Source Research Collaboration Aarlberg Denmark 3 Replica Analytics Ltd Ottawa, ON Canada 2 Children's Hospital of Eastern Ontario Research Institute Ottawa, ON Canada 1 School of Epidemiology and Public Health University of Ottawa Ottawa, ON Canada |
AuthorAffiliation_xml | – name: 4 Open Source Research Collaboration Aarlberg Denmark – name: 2 Children's Hospital of Eastern Ontario Research Institute Ottawa, ON Canada – name: 3 Replica Analytics Ltd Ottawa, ON Canada – name: 1 School of Epidemiology and Public Health University of Ottawa Ottawa, ON Canada |
Author_xml | – sequence: 1 givenname: Khaled orcidid: 0000-0003-3325-4149 surname: El Emam fullname: El Emam, Khaled – sequence: 2 givenname: Lucy orcidid: 0000-0002-5289-8372 surname: Mosquera fullname: Mosquera, Lucy – sequence: 3 givenname: Xi orcidid: 0000-0002-5571-7004 surname: Fang fullname: Fang, Xi – sequence: 4 givenname: Alaa orcidid: 0000-0002-0070-8362 surname: El-Hussuna fullname: El-Hussuna, Alaa |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/35389366$$D View this record in MEDLINE/PubMed |
BookMark | eNpdkt9rFDEQgINUbHvevyALIgjlNL838UGQWttCxYdawaeQS2bvcuwlbTZbuP_e9K6VXl-SMPnyMZOZY3QQUwSEpgR_okTLz0y0jL9CR5RqMtNS84Nn50M0HYYVxphwIqVs36BDJpjSTMoj9PemhD6UTfMTSg5uaLqUm7N724-2hLhorjexLKEE11yA7cuy-W6Lbc4hQq5Aig_vlskPX5o_tg9-F7suo9-8Ra872w8wfdwn6ObH2e_Ti9nVr_PL029XM8dFW2beUtphxzsuMLYad95LDJhb7ru5BEWwaJ2cAxEKEyfajs4F9Qo8p1K1nLMJutx5fbIrc5vD2uaNSTaYbSDlhbG5FtCDsVR7TgQVLWccQFkqtJor55h3DIBW19ed63acr8E7iCXbfk-6fxPD0izSvdGYYV2XCfr4KMjpboShmHUYHPS9jZDGwVDJFZZUc1HR9y_QVRpzrF9VKdEKIjFVlXr3PKP_qTx1sAInO8DlNAwZOuNC2bahJhh6Q7B5GBGzHZFKf3hBPwn3uX_GX7iE |
CitedBy_id | crossref_primary_10_1016_j_compenvurbsys_2024_102242 crossref_primary_10_1051_medsci_2024091 crossref_primary_10_1200_CCI_23_00071 crossref_primary_10_1093_jamia_ocac131 crossref_primary_10_1038_s41598_024_57207_7 crossref_primary_10_3389_frai_2025_1533508 crossref_primary_10_1038_s41598_024_69812_7 crossref_primary_10_1016_j_tips_2023_06_010 crossref_primary_10_1038_s41598_024_51268_4 crossref_primary_10_1145_3636424 crossref_primary_10_1093_jamiaopen_ooac083 crossref_primary_10_1016_j_ijmedinf_2024_105413 crossref_primary_10_1016_j_atech_2023_100361 crossref_primary_10_1109_ACCESS_2024_3366556 crossref_primary_10_1016_j_isci_2022_105331 crossref_primary_10_1109_OJEMB_2024_3426910 crossref_primary_10_1145_3704437 crossref_primary_10_1007_s10618_024_01081_4 crossref_primary_10_1200_CCI_23_00021 crossref_primary_10_1186_s12874_023_01869_w crossref_primary_10_1109_JBHI_2023_3236722 crossref_primary_10_1016_j_compbiomed_2024_108734 crossref_primary_10_1186_s12911_024_02731_9 crossref_primary_10_1002_pds_70019 crossref_primary_10_3390_electronics11203277 crossref_primary_10_1093_jamiaopen_ooae114 crossref_primary_10_2196_66821 crossref_primary_10_1109_ACCESS_2025_3532128 crossref_primary_10_1002_cpt_3001 crossref_primary_10_20948_prepr_2024_53 |
Cites_doi | 10.2478/popets-2019-0067 10.1109/cbms.2019.00036 10.46300/9101 10.3233/sji-150153 10.1007/978-3-319-11257-2_15 10.1080/09332480.2004.10554907 10.1016/j.csda.2011.06.006 10.1023/A:1010920819831 10.29012/jpc.v7i3.407 10.2196/16492 10.1186/s12874-020-00977-1 10.14778/3231751.3231757 10.1093/jamiaopen/ooaa060 10.1145/1143844.1143874 10.1111/j.1467-985x.2004.00343.x 10.1037/pspp0000208 10.1093/biomet/70.1.41 10.1093/jamia/ocz161 10.1007/978-3-319-99771-1_9 10.1136/bmjopen-2020-043497 10.2196/18910 10.1007/978-1-4612-1166-2 10.1287/mnsc.6.4.366 10.1007/springerreference_64338 10.1111/rssa.12358 10.1145/3085504.3091117 10.1093/jamiaopen/ooab012 10.1198/000313006x124640 10.1007/978-3-319-99771-1_5 10.1080/19345747.2019.1631421 10.1093/jamia/ocaa249 10.3390/app11052158 10.1038/s41746-020-00353-9 10.1161/CIRCOUTCOMES.118.005122 10.3233/sji-160959 10.1109/SmartGridComm.2018.8587464 10.1016/j.jclinepi.2019.02.004 10.2196/23139 10.29012/jpc.v1i1.568 10.3929/ethz-b-000392473 10.1093/bioinformatics/btm158 10.1145/3372297.3417238 |
ContentType | Journal Article |
Copyright | Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. 2022 |
Copyright_xml | – notice: Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. – notice: 2022. This work is licensed under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. 2022 |
DBID | AAYXX CITATION NPM 3V. 7X7 7XB 88C 8FI 8FJ 8FK ABUWG AFKRA AZQEC BENPR CCPQU COVID DWQXO FYUFA GHDGH K9. M0S M0T PHGZM PHGZT PIMPY PJZUB PKEHL PPXIY PQEST PQQKQ PQUKI PRINS 7X8 5PM DOA |
DOI | 10.2196/35734 |
DatabaseName | CrossRef PubMed ProQuest Central (Corporate) Health & Medical Collection ProQuest Central (purchase pre-March 2016) Healthcare Administration Database (Alumni) Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central ProQuest One Community College Coronavirus Research Database ProQuest Central Health Research Premium Collection Health Research Premium Collection (Alumni) ProQuest Health & Medical Complete (Alumni) Health & Medical Collection (Alumni) Healthcare Administration Database ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest Health & Medical Research Collection ProQuest One Academic Middle East (New) ProQuest One Health & Nursing ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef PubMed Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Health & Medical Complete (Alumni) ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Health & Nursing ProQuest Central China ProQuest Central Health Research Premium Collection Health and Medicine Complete (Alumni Edition) ProQuest Central Korea Health & Medical Research Collection ProQuest Central (New) ProQuest One Academic Eastern Edition ProQuest Health Management Coronavirus Research Database ProQuest Hospital Collection Health Research Premium Collection (Alumni) ProQuest Hospital Collection (Alumni) ProQuest Health & Medical Complete ProQuest One Academic UKI Edition ProQuest Health Management (Alumni Edition) ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic PubMed Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Medicine |
EISSN | 2291-9694 |
ExternalDocumentID | oai_doaj_org_article_a29d415257434ee8a2598b8cc3dc3ee2 PMC9030990 35389366 10_2196_35734 |
Genre | Journal Article |
GroupedDBID | 53G 5VS 7X7 8FI 8FJ AAFWJ AAYXX ABUWG ADBBV AFKRA AFPKN ALIPV ALMA_UNASSIGNED_HOLDINGS AOIJS BAWUL BCNDV BENPR CCPQU CITATION DIK EMOBN FYUFA GROUPED_DOAJ HMCUK HYE KQ8 M0T M48 M~E OK1 PGMZT PHGZM PHGZT PIMPY RPM UKHRP NPM 3V. 7XB 8FK AZQEC COVID DWQXO K9. PJZUB PKEHL PPXIY PQEST PQQKQ PQUKI PRINS 7X8 5PM PUEGO |
ID | FETCH-LOGICAL-c457t-da22f0c4f4500a90fdd60e04a4dfb6e81057c6be15801c57f2b52d8ed42687443 |
IEDL.DBID | M48 |
ISSN | 2291-9694 |
IngestDate | Wed Aug 27 01:29:31 EDT 2025 Thu Aug 21 18:20:25 EDT 2025 Mon Jul 21 11:00:07 EDT 2025 Fri Jul 25 02:24:59 EDT 2025 Thu Jan 02 22:55:20 EST 2025 Tue Jul 01 01:41:59 EDT 2025 Thu Apr 24 23:06:33 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Keywords | model validation medical informatics prediction model synthetic data generation utility metric synthetic data generative models data privacy data utility binary prediction model logistic regression |
Language | English |
License | Khaled El Emam, Lucy Mosquera, Xi Fang, Alaa El-Hussuna. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 07.04.2022. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c457t-da22f0c4f4500a90fdd60e04a4dfb6e81057c6be15801c57f2b52d8ed42687443 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ORCID | 0000-0002-5571-7004 0000-0002-0070-8362 0000-0002-5289-8372 0000-0003-3325-4149 |
OpenAccessLink | http://journals.scholarsportal.info/openUrl.xqy?doi=10.2196/35734 |
PMID | 35389366 |
PQID | 2657516028 |
PQPubID | 4997117 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_a29d415257434ee8a2598b8cc3dc3ee2 pubmedcentral_primary_oai_pubmedcentral_nih_gov_9030990 proquest_miscellaneous_2648062945 proquest_journals_2657516028 pubmed_primary_35389366 crossref_citationtrail_10_2196_35734 crossref_primary_10_2196_35734 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20220407 |
PublicationDateYYYYMMDD | 2022-04-07 |
PublicationDate_xml | – month: 4 year: 2022 text: 20220407 day: 7 |
PublicationDecade | 2020 |
PublicationPlace | Canada |
PublicationPlace_xml | – name: Canada – name: Toronto – name: Toronto, Canada |
PublicationTitle | JMIR medical informatics |
PublicationTitleAlternate | JMIR Med Inform |
PublicationYear | 2022 |
Publisher | JMIR Publications |
Publisher_xml | – name: JMIR Publications |
References | ref13 ref57 ref12 ref56 ref15 Gomatam, S (ref31) 2005; 21 ref59 ref14 ref58 ref52 ref11 ref55 ref54 ref17 ref16 ref19 ref18 Hu, J (ref5) 2014 Joe, H (ref33) 2015 ref51 ref50 ref46 ref48 ref47 ref42 ref41 ref44 ref43 Le Cam, L (ref30) 2000 ref49 ref8 ref9 Sabay, A (ref53) 2018; 1 ref3 ref6 ref40 El Emam, K (ref10) 2020 ref35 ref34 ref37 ref36 Pepe, MS (ref45) 2004 ref32 ref2 ref1 ref39 ref38 Siegel, S (ref62) 1988 ref24 Ruiz, N (ref7) 2018 ref23 ref26 ref25 ref20 ref64 ref63 ref22 ref21 ref65 Taub, J (ref4) 2018 ref28 ref27 ref29 ref60 ref61 |
References_xml | – ident: ref37 – ident: ref65 doi: 10.2478/popets-2019-0067 – ident: ref59 doi: 10.1109/cbms.2019.00036 – ident: ref20 – ident: ref43 – ident: ref24 doi: 10.46300/9101 – year: 2004 ident: ref45 publication-title: The Statistical Evaluation of Medical Tests for Classification and Prediction – ident: ref57 doi: 10.3233/sji-150153 – ident: ref27 – volume: 21 start-page: 635 issue: 4 year: 2005 ident: ref31 publication-title: J Off Stat – start-page: 185 year: 2014 ident: ref5 publication-title: Privacy in Statistical Databases doi: 10.1007/978-3-319-11257-2_15 – ident: ref1 doi: 10.1080/09332480.2004.10554907 – ident: ref34 – ident: ref50 doi: 10.1016/j.csda.2011.06.006 – ident: ref47 doi: 10.1023/A:1010920819831 – ident: ref36 – ident: ref56 doi: 10.29012/jpc.v7i3.407 – ident: ref14 doi: 10.2196/16492 – ident: ref17 doi: 10.1186/s12874-020-00977-1 – ident: ref2 doi: 10.14778/3231751.3231757 – ident: ref16 doi: 10.1093/jamiaopen/ooaa060 – ident: ref46 doi: 10.1145/1143844.1143874 – ident: ref8 doi: 10.1111/j.1467-985x.2004.00343.x – ident: ref51 doi: 10.1037/pspp0000208 – ident: ref40 doi: 10.1093/biomet/70.1.41 – year: 1988 ident: ref62 publication-title: Nonparametric statistics for the behavioral sciences, 2nd ed – year: 2015 ident: ref33 publication-title: Dependence Modeling with Copulas – ident: ref61 doi: 10.1093/jamia/ocz161 – start-page: 122 year: 2018 ident: ref4 publication-title: Privacy in Statistical Databases doi: 10.1007/978-3-319-99771-1_9 – ident: ref54 – ident: ref12 doi: 10.1136/bmjopen-2020-043497 – ident: ref15 doi: 10.2196/18910 – year: 2000 ident: ref30 publication-title: Asymptotics in Statistics: Some Basic Concepts doi: 10.1007/978-1-4612-1166-2 – ident: ref35 doi: 10.1287/mnsc.6.4.366 – ident: ref39 – ident: ref3 – ident: ref55 doi: 10.1007/springerreference_64338 – ident: ref58 – ident: ref22 doi: 10.1111/rssa.12358 – ident: ref49 doi: 10.1145/3085504.3091117 – volume: 1 start-page: 12 issue: 3 year: 2018 ident: ref53 publication-title: SMU Data Science Review – year: 2020 ident: ref10 publication-title: Practical Synthetic Data Generation – ident: ref25 – ident: ref13 doi: 10.1093/jamiaopen/ooab012 – ident: ref48 – ident: ref11 doi: 10.1198/000313006x124640 – start-page: 59 year: 2018 ident: ref7 publication-title: Privacy in Statistical Databases doi: 10.1007/978-3-319-99771-1_5 – ident: ref32 – ident: ref52 doi: 10.1080/19345747.2019.1631421 – ident: ref19 doi: 10.1093/jamia/ocaa249 – ident: ref23 doi: 10.3390/app11052158 – ident: ref26 doi: 10.1038/s41746-020-00353-9 – ident: ref38 – ident: ref28 – ident: ref41 doi: 10.1161/CIRCOUTCOMES.118.005122 – ident: ref42 – ident: ref6 doi: 10.3233/sji-160959 – ident: ref29 doi: 10.1109/SmartGridComm.2018.8587464 – ident: ref44 doi: 10.1016/j.jclinepi.2019.02.004 – ident: ref9 doi: 10.2196/23139 – ident: ref21 doi: 10.29012/jpc.v1i1.568 – ident: ref60 doi: 10.3929/ethz-b-000392473 – ident: ref63 doi: 10.1093/bioinformatics/btm158 – ident: ref64 doi: 10.1145/3372297.3417238 – ident: ref18 |
SSID | ssj0001416667 |
Score | 2.4028225 |
Snippet | A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility... Background: A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple... BackgroundA regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple... |
SourceID | doaj pubmedcentral proquest pubmed crossref |
SourceType | Open Website Open Access Repository Aggregation Database Index Database Enrichment Source |
StartPage | e35734 |
SubjectTerms | Cluster analysis Datasets Decision making Multimedia Original Paper Privacy Time series Validation studies Workloads |
SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1La9wwEB5KDqFQSpO-nCZBgVxNtHpaveVJCGwu7Zb0ZGQ9mkLxlqxz2H_fGdtZdkOhl14tyUgzkucbz-gbgGPpXZikLEulGlEqz2Xp7USXxqUkTBCuinTBeXprrmfq5k7frZX6opywgR54ENyJFy6SkdFo6lRKlUe8XjVVCDIGie-jry_avDVnqv-7oigcZrfhFeU64y47kdpKtWF8eo7-vwHL5_mRawbn6g28HpEiOx1muAMvUrsL29MxFv4Wvs86ymtdsikVxQoLhvCTXY7k3e0P9mXZIrjDwWy4asQufOfZQDNN2qBx9_O4-My-IRYfSisxSitcvoPZ1eXX8-tyLJRQBqVtV0YvROZBZaU5947nGA1PXHkVc2NSRbV8g2nSRKM9Ctpm0WgRqxTRPBP9vXwPW-28TR-BWR-zsTK7mIzy6IshImzQyVGVkllGU8DxkwTrMLKIUzGLXzV6EyTouhd0AYerbr8H2oznHc5I_KtGYrnuH6Du61H39b90X8D-k_Lq8egtatGHkgzipgKOVs14aCgS4ts0f6Q-quJGOKUL-DDoejUTqQnDGVyp3dgFG1PdbGl_3vfE3I7iVY7v_Y-1fYKXgm5aUJKQ3Yet7uExHSD-6ZrDfqv_AR-_BJI priority: 102 providerName: Directory of Open Access Journals – databaseName: Health & Medical Collection dbid: 7X7 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3daxQxEB-0QhGKtH6utiVCX5fm8rnxRbS2FOF80ZPzacnmoy3Ibu1tH-6_b2Y3d_WK-LpJIJuZZH6TmfwG4Ihb4yYh8lKIhpXCUl5aPZGlMiEw5ZipPD5wnn5T5zPxdS7n-cJtkdMqV2ficFD7zuEd-TEbIgQqmcOP139KrBqF0dVcQuMxPEHqMtRqPdf3dywCg2J6G3Yw4znp2jGXmosNEzQw9f8LXj7MkvzL7JztwrOMF8mnUcB78Ci0z2F7miPiL-DXrMfs1iWZYmkstyAJhJLTTOHdXpDvyzZBvDSYjA-OyBfbWzKSTaNMcNxl5xcfyM-EyMcCSwSTC5cvYXZ2-uPkvMzlEkonpO5LbxmL1IkoJKXW0Oi9ooEKK3xsVKiwoq9TTZjIZJWc1JE1kvkq-GSkkQSfv4KttmvDGyDa-qg0j8YHJWzyyBIubJKrIyrBI_eqgKPVCtYuc4ljSYvfdfIpcKHrYaELOFx3ux7JMx52-IzLv25EruvhQ3dzUeetU1tmPMIMmcCOCKGyyWOrmso57h1PGlXA_kp4dd6Ai_peXQp4v25OWwfjIbYN3S32ERVVzAhZwOtR1uuZcIlITqU_1RtasDHVzZb26nKg5zYYtTL07f-n9Q6eMnxJgUlAeh-2-pvbcJDwTd8cDkp8Bznn--U priority: 102 providerName: ProQuest |
Title | Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study |
URI | https://www.ncbi.nlm.nih.gov/pubmed/35389366 https://www.proquest.com/docview/2657516028 https://www.proquest.com/docview/2648062945 https://pubmed.ncbi.nlm.nih.gov/PMC9030990 https://doaj.org/article/a29d415257434ee8a2598b8cc3dc3ee2 |
Volume | 10 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhV1ta9RAEB5sC0Uo4muN1rBCv0Zz-5oVRKxeKcIVUU_qp7DZ3bRCydW7FLx_70ySO3ql4NfsLuzOzDLPZGafATgUzvpRrEUmZcUz6XKROTNSmbYxcu25LQI9cJ6c6pOp_HKmblQTDgJc3BnaUT-p6fzyzd8_yw944d9TGTMa0FuhjJBbsIPOyFATg8mA8LvfLJLyYvRomnM7yqy2chf2NlZueKSOuP8utHm7aPKGFzp-CA8G-Mg-9vp-BPdi8xh2J0OC_An8mrZU7LpkE-qU5RcMMSkbD4zezTn7vmwQ8eFi1r8_Yp9d61jPPU0qonUXs7B4x34iQO_7LTGqNVw-henx-Menk2zonpB5qUybBcd5nXtZS5XnzuZ1CDqPuXQy1JWOBTX49bqKI4VOyitT80rxUMSAPps48cUz2G5mTXwOzLhQayNqG6KWDgM0hIkVRj6ykKIWQSdwuJJg6QdqcepwcVliiEGCLjtBJ5Cup131XBq3JxyR-NeDRH3dfZjNz8vhJpWO20CoQyH2kTEWDgO4oiq8F8ELNLAEDlbKK1fmVPIuv6QRTCXwej2MN4nSI66Js2uaI4tccytVAvu9rtc7EYqAncaTmg0r2Njq5kjz-6Jj67aUxLL5i_8d_iXc5_S0gqqCzAFst_Pr-AoBT1ulsGXOTAo7R-PTr9_S7rdB2pn5PwyoAhQ |
linkProvider | Scholars Portal |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3dT9RAEJ8gJmhijN-eIq4JPjb09rM1MUYFcgjHi5w5n-p2dwskpEWuxNw_5d_oTD8OjxjfeO1um-3szM5vdr4ANoVN3TAUIpIy55G0sYisGapIpyFw7XiaeEpwHh_q0UR-marpCvzuc2EorLI_E5uD2leO7si3eOMh0KgOP5z_jKhrFHlX-xYaLVvsh_kvNNlm7_e2cX_fcr67c_R5FHVdBSInlakjbzkvYicLqeLYpnHhvY5DLK30Ra5DQo1vnc7DUOHh7ZQpeK64T4JHXUa14gV-9xbcRsUbk7FnpubqTkeSE86swT2KsEbe3hLKCLmk8prOAP-Cs9ejMv9Sc7sP4H6HT9nHlqEewkooH8HauPPAP4bvk5qiaedsTK243Iwh6GU7Xcnw8ph9nZcIKfFl1iY4sW1bW9YWtyYeoPdOKj97x76hBdA2dGIUzDh_ApMbIeRTWC2rMjwHZqwvtBFF6oOWFi1AxKE5mlYykaIQXg9gs6dg5rra5dRC4yxDG4YInTWEHsDGYtp5W6zj-oRPRP7FINXWbh5UF8dZJ6qZ5aknWKMQXMkQEosWYpInzgnvBHLwANb7zcs6gZ9lV-w5gDeLYRRV8r_YMlSXNEcmseapVAN41u71YiVCEXLU-KdmiQuWlro8Up6eNOXAU_KSpfGL_y_rNdwZHY0PsoO9w_2XcJdTFgcFIJl1WK0vLsMrxFZ1vtEwNIMfNy1BfwBygjh5 |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3rb9MwED-NTqqQEOJNYRtGGh-jpn4lQUJoo602RqsJKBqfgmM7GxJKxtoJ9V_jr-Muj45OiG_7GjuRc77z_c73AtgVJrEDn4tAyowH0oQiMNFABTrxnmvLk9hRgvNkqg9m8v2JOtmA320uDIVVtmdidVC70tIdeZ9XHgKN6rCfN2ERx8Px2_OfAXWQIk9r206jZpEjv_yF5tv8zeEQ9_oV5-PR53cHQdNhILBSRYvAGc7z0MpcqjA0SZg7p0MfSiNdnmkfUxNcqzM_UHiQWxXlPFPcxd6hXqO68QK_ews2I7KKOrC5P5oef7y64ZHkkou6cIfirZHT-0JFQq4pwKpPwL_A7fUYzb-U3vge3G3QKtur2es-bPjiAXQnjT_-IXydLSi2dskm1JjLzhlCYDZqCogXp-zTskCAiS-zOt2JDc3CsLrUNXEEvXdWuvlr9gXtgbq9E6PQxuUjmN0IKR9DpygL_xRYZFyuI5Enzmtp0B5EVJqhoSVjKXLhdA92WwqmtqlkTg01fqRo0RCh04rQPdhZTTuvS3dcn7BP5F8NUqXt6kF5cZo2gpsanjgCOQqhlvQ-NmgvxllsrXBWID_3YKvdvLQR_3l6xaw9eLkaRsElb4wpfHlJc2Qcap5I1YMn9V6vViIU4UiNfxqtccHaUtdHiu9nVXHwhHxmSfjs_8t6AV2UnvTD4fToOdzmlNJB0UjRFnQWF5d-G4HWIttpOJrBt5sWoj8n0z4U |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Utility+Metrics+for+Evaluating+Synthetic+Health+Data+Generation+Methods%3A+Validation+Study&rft.jtitle=JMIR+medical+informatics&rft.au=El+Emam%2C+Khaled&rft.au=Mosquera%2C+Lucy&rft.au=Fang%2C+Xi&rft.au=El-Hussuna%2C+Alaa&rft.date=2022-04-07&rft.issn=2291-9694&rft.eissn=2291-9694&rft.volume=10&rft.issue=4&rft.spage=e35734&rft_id=info:doi/10.2196%2F35734&rft.externalDBID=n%2Fa&rft.externalDocID=10_2196_35734 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2291-9694&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2291-9694&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2291-9694&client=summon |