Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning A Qualitative Study

The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. To discern what constit...

Full description

Saved in:

Bibliographic Details
Published in	JAMA network open Vol. 6; no. 12; p. e2345892
Main Authors	Ng, Madelena Y., Youssef, Alaa, Miner, Adam S., Sarellano, Daniela, Long, Jin, Larson, David B., Hernandez-Boussard, Tina, Langlotz, Curtis P.
Format	Journal Article
Language	English
Published	United States American Medical Association 01.12.2023
Subjects	Adult Artificial Intelligence Datasets Delivery of Health Care Ethics Female Health Informatics Humans Machine Learning Male Online Only Original Investigation Qualitative Research
Online Access	Get full text

Cover

Loading…

Abstract	The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. Data set experts' perceptions on what makes data sets AI ready. Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.
AbstractList	ImportanceThe lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care.ObjectiveTo discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts.Design, Setting, and ParticipantsThis qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data.Main Outcomes and MeasuresData set experts’ perceptions on what makes data sets AI ready.ResultsParticipants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness.Conclusions and RelevanceIn this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices. The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care.ImportanceThe lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care.To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts.ObjectiveTo discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts.This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data.Design, Setting, and ParticipantsThis qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data.Data set experts' perceptions on what makes data sets AI ready.Main Outcomes and MeasuresData set experts' perceptions on what makes data sets AI ready.Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness.ResultsParticipants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness.In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.Conclusions and RelevanceIn this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices. This qualitative study examines the perceptions of data set experts on the present status of development of artificial intelligence (AI)–ready data sets for use in machine learning research. The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. Data set experts' perceptions on what makes data sets AI ready. Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.
Author	Youssef, Alaa Long, Jin Ng, Madelena Y. Hernandez-Boussard, Tina Sarellano, Daniela Langlotz, Curtis P. Larson, David B. Miner, Adam S.
AuthorAffiliation	5 Department of Pediatrics, Stanford University School of Medicine, Stanford, California 3 Department of Radiology, Stanford University School of Medicine, Stanford, California 4 Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California 2 Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California 1 Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Stanford, California
AuthorAffiliation_xml	– name: 2 Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California – name: 4 Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California – name: 3 Department of Radiology, Stanford University School of Medicine, Stanford, California – name: 5 Department of Pediatrics, Stanford University School of Medicine, Stanford, California – name: 1 Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Stanford, California
Author_xml	– sequence: 1 givenname: Madelena Y. surname: Ng fullname: Ng, Madelena Y. organization: Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Stanford, California, Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California – sequence: 2 givenname: Alaa surname: Youssef fullname: Youssef, Alaa organization: Department of Radiology, Stanford University School of Medicine, Stanford, California – sequence: 3 givenname: Adam S. surname: Miner fullname: Miner, Adam S. organization: Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California – sequence: 4 givenname: Daniela surname: Sarellano fullname: Sarellano, Daniela organization: Department of Radiology, Stanford University School of Medicine, Stanford, California – sequence: 5 givenname: Jin surname: Long fullname: Long, Jin organization: Department of Pediatrics, Stanford University School of Medicine, Stanford, California – sequence: 6 givenname: David B. surname: Larson fullname: Larson, David B. organization: Department of Radiology, Stanford University School of Medicine, Stanford, California – sequence: 7 givenname: Tina surname: Hernandez-Boussard fullname: Hernandez-Boussard, Tina organization: Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Stanford, California, Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California – sequence: 8 givenname: Curtis P. surname: Langlotz fullname: Langlotz, Curtis P. organization: Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Stanford, California, Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, Department of Radiology, Stanford University School of Medicine, Stanford, California
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/38039004$$D View this record in MEDLINE/PubMed
BookMark	eNqNUcluFDEUtFAispBfQBZcuMzES2_mQtAkJJEmArGcrTfu1xlPuu3G9gD5ezzZFObEydZzVb1y1QHZcd4hIW84m3LG-PEKBnCYfvtw40d0U8GEnBZlo8QLsi_KupjIhpU7z-575CjGFWNMMC5VVb4ke3ksFWPFPrn5gsHgmKx3kfqOnkIC-g0TPfszYkh55ujlMPqQwCU6W0IAkzDYmKy5I1wg9Gn5xIv0K0J7Szsf6BWYpXVI5wjBWXf9iux20Ec8ejgPyY9PZ99nF5P55_PL2cf5BKSs06RsWWlAFlXXdiWTzBRyIZRouWoqISpskakCwGCtapBGyQraRdc1teSK8crIQ_LhXndcLwZsDboUoNdjsAOEW-3B6n9fnF3qa_9Lc1Yp0VQyK7x7UAj-5xpj0oONBvs-R-_XUYtGVQ3LWdYZ-nYLuvLr4PL_tMxxi1JwwTPq9XNLT14ei8iAk3uACT7GgJ02NsGmluzQ9tma3tSvt-rXm_r1Xf1Z4v2WxOOW_yD_BdWkvO8
CitedBy_id	crossref_primary_10_1016_j_imavis_2024_105068 crossref_primary_10_1016_j_outlook_2024_102343 crossref_primary_10_1080_10408398_2025_2461237 crossref_primary_10_1097_ALN_0000000000004998 crossref_primary_10_1055_a_2415_8408 crossref_primary_10_3389_fphar_2023_1276149 crossref_primary_10_1371_journal_pdig_0000474 crossref_primary_10_1093_database_baae083 crossref_primary_10_1002_cai2_136 crossref_primary_10_1016_j_comtox_2024_100316 crossref_primary_10_7759_cureus_78068
Cites_doi	10.1109/TR.2021.3070863 10.1001/jama.2020.12067 10.1177/1049732305276687 10.1093/jamia/ocaa210 10.1136/bmj.m1328 10.5688/ajpe7113 10.3390/jcm11082265 10.1080/07421222.1996.11518099 10.1186/2047-2501-2-4 10.1093/jamia/ocaa088 10.5334/dsj-2015-002 10.1001/jamanetworkopen.2022.27779 10.1111/nhs.2013.15.issue-3 10.1038/s41591-022-01993-y 10.1001/jamanetworkopen.2020.34630 10.1148/radiol.2020192536 10.1126/science.aax2342 10.1038/s41591-018-0300-7 10.1007/s10488-013-0528-y 10.4301/S1807-1775 10.1093/jamia/ocac156 10.1056/NEJMra1814259 10.1001/jama.2021.19493 10.1177/2050312118822927 10.1038/s41586-020-2766-y 10.4018/JDM 10.1016/j.ejmp.2021.02.007 10.1145/3362121 10.1038/sdata.2016.18 10.1371/journal.pone.0229182 10.1007/s11135-017-0574-8 10.4135/9781483384436 10.1016/j.future.2018.07.014 10.1056/NEJMms2004740 10.1016/S2589-7500(23)00025-0 10.1186/s40537-021-00468-0 10.1016/j.amepre.2008.05.003 10.1146/biodatasci.2021.4.issue-1 10.1037/amp0000334 10.1145/3531146.3533239
ContentType	Journal Article
Copyright	2023. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. Copyright 2023 Ng MY et al. .
Copyright_xml	– notice: 2023. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: Copyright 2023 Ng MY et al. .
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM K9. 7X8 5PM
DOI	10.1001/jamanetworkopen.2023.45892
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed ProQuest Health & Medical Complete (Alumni) MEDLINE - Academic PubMed Central (Full Participant titles)
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) ProQuest Health & Medical Complete (Alumni) MEDLINE - Academic
DatabaseTitleList	ProQuest Health & Medical Complete (Alumni) MEDLINE - Academic MEDLINE
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
DocumentTitleAlternate	Expert Perceptions of Characteristics of High-Quality AI-Ready Health Data Sets
EISSN	2574-3805
ExternalDocumentID	PMC10692863 38039004 10_1001_jamanetworkopen_2023_45892
Genre	Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural
GrantInformation_xml	– fundername: NLM NIH HHS grantid: T15 LM007033
GroupedDBID	0R~ 53G 7X7 8FI 8FJ AAYXX ABUWG ADBBV ADPDF AFKRA ALIPV ALMA_UNASSIGNED_HOLDINGS AMJDE BCNDV BENPR CCPQU CITATION EBS EMOBN FYUFA GROUPED_DOAJ H13 HMCUK M~E OK1 OVD OVEED PHGZM PHGZT PIMPY RAJ TEORI UKHRP W2D CGR CUY CVF ECM EIF NPM K9. 7X8 5PM
ID	FETCH-LOGICAL-a337t-5d05ca346fdf5030c43b292d1986226ede094aace797a3c936adbff87319016c3
ISSN	2574-3805
IngestDate	Thu Aug 21 18:35:45 EDT 2025 Fri Jul 11 08:11:17 EDT 2025 Mon Jun 30 14:10:38 EDT 2025 Thu Apr 03 07:07:15 EDT 2025 Thu Apr 24 22:55:47 EDT 2025 Tue Jul 01 02:17:45 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	12
Language	English
License	This is an open access article distributed under the terms of the CC-BY License.
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-a337t-5d05ca346fdf5030c43b292d1986226ede094aace797a3c936adbff87319016c3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
OpenAccessLink	http://dx.doi.org/10.1001/jamanetworkopen.2023.45892
PMID	38039004
PQID	3139252121
PQPubID	5319538
ParticipantIDs	pubmedcentral_primary_oai_pubmedcentral_nih_gov_10692863 proquest_miscellaneous_2896809657 proquest_journals_3139252121 pubmed_primary_38039004 crossref_citationtrail_10_1001_jamanetworkopen_2023_45892 crossref_primary_10_1001_jamanetworkopen_2023_45892
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-12-01
PublicationDateYYYYMMDD	2023-12-01
PublicationDate_xml	– month: 12 year: 2023 text: 2023-12-01 day: 01
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: Chicago
PublicationTitle	JAMA network open
PublicationTitleAlternate	JAMA Netw Open
PublicationYear	2023
Publisher	American Medical Association
Publisher_xml	– name: American Medical Association
References	Kleinheksel (zoi231335r35) 2020; 84 Perrier (zoi231335r44) 2020; 15 Salas (zoi231335r51) 2018; 73 Wang (zoi231335r19) 1996; 12 Larson (zoi231335r4) 2020; 295 Palinkas (zoi231335r32) 2015; 42 Wade (zoi231335r42) 2014; 2 Lu (zoi231335r9) 2022; 5 Röösli (zoi231335r14) 2021; 28 Cai (zoi231335r23) de Hond (zoi231335r52) 2022; 29 zoi231335r47 Topol (zoi231335r2) 2019; 25 Wynants (zoi231335r13) 2020; 369 Haibe-Kains (zoi231335r5) 2020; 586 zoi231335r46 zoi231335r41 Chen (zoi231335r8) 2021; 4 Hsieh (zoi231335r34) 2005; 15 Becker (zoi231335r22) 2015 Vaismoradi (zoi231335r36) 2013; 15 zoi231335r7 Chun Tie (zoi231335r40) 2019 zoi231335r6 Firmani (zoi231335r29) 2020; 12 Serhani (zoi231335r24) 2016 Stokols (zoi231335r50) 2008; 35 Saunders (zoi231335r33) 2018; 52 Catarci (zoi231335r25) 2017 Yang (zoi231335r31) 2021; 326 Gordon (zoi231335r53) 2022 Wilkinson (zoi231335r43) 2016; 3 El Alaoui (zoi231335r27) 2019 Alberto (zoi231335r45) 2023; 5 Ng (zoi231335r15) 2022; 28 Charmaz (zoi231335r39) 2014 Chen (zoi231335r30) 2021; 70 Holland (zoi231335r48) 2020 Ramasamy (zoi231335r20) Diaz (zoi231335r10) 2021; 83 Kaushal (zoi231335r12) 2020; 324 Batini (zoi231335r21) 2015; 26 Vyas (zoi231335r17) 2020; 383 Taleb (zoi231335r28) 2021; 8 Guest (zoi231335r37) 2012 Boulware (zoi231335r18) 2021; 4 Rajkomar (zoi231335r1) 2019; 380 Obermeyer (zoi231335r16) 2019; 366 Maguire (zoi231335r38) 2017; 8 Busnatu (zoi231335r3) 2022; 11 zoi231335r11 Hernandez-Boussard (zoi231335r49) 2020; 27 Ardagna (zoi231335r26) 2018; 89
References_xml	– volume: 70 start-page: 831 issue: 2 year: 2021 ident: zoi231335r30 article-title: Data evaluation and enhancement for quality improvement of machine learning. publication-title: IEEE Trans Reliab doi: 10.1109/TR.2021.3070863 – ident: zoi231335r41 – volume: 324 start-page: 1212 issue: 12 year: 2020 ident: zoi231335r12 article-title: Geographic distribution of US cohorts used to train deep learning algorithms. publication-title: JAMA doi: 10.1001/jama.2020.12067 – volume: 15 start-page: 1277 issue: 9 year: 2005 ident: zoi231335r34 article-title: Three approaches to qualitative content analysis. publication-title: Qual Health Res doi: 10.1177/1049732305276687 – year: 2015 ident: zoi231335r22 – ident: zoi231335r6 – volume: 28 start-page: 190 issue: 1 year: 2021 ident: zoi231335r14 article-title: Bias at warp speed: how AI may contribute to the disparities gap in the time of COVID-19. publication-title: J Am Med Inform Assoc doi: 10.1093/jamia/ocaa210 – volume: 369 start-page: m1328 year: 2020 ident: zoi231335r13 article-title: Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. publication-title: BMJ doi: 10.1136/bmj.m1328 – volume: 84 start-page: 7113 issue: 1 year: 2020 ident: zoi231335r35 article-title: Demystifying content analysis. publication-title: Am J Pharm Educ doi: 10.5688/ajpe7113 – volume: 11 start-page: 2265 issue: 8 year: 2022 ident: zoi231335r3 article-title: Clinical applications of artificial intelligence—an updated overview. publication-title: J Clin Med doi: 10.3390/jcm11082265 – year: 2016 ident: zoi231335r24 – volume: 12 start-page: 5 issue: 4 year: 1996 ident: zoi231335r19 article-title: Beyond accuracy: what data quality means to data consumers. publication-title: J Manage Inf Syst doi: 10.1080/07421222.1996.11518099 – volume: 2 start-page: 4 issue: 1 year: 2014 ident: zoi231335r42 article-title: Traits and types of health data repositories. publication-title: Health Inf Sci Syst doi: 10.1186/2047-2501-2-4 – volume-title: Data Protection and Privacy: Data Protection and Democracy year: 2020 ident: zoi231335r48 – year: 2019 ident: zoi231335r27 – ident: zoi231335r46 – volume: 27 start-page: 2011 issue: 12 year: 2020 ident: zoi231335r49 article-title: MINIMAR (minimum information for medical AI reporting): developing reporting standards for artificial intelligence in health care. publication-title: J Am Med Inform Assoc doi: 10.1093/jamia/ocaa088 – ident: zoi231335r23 article-title: The challenges of data quality and data quality assessment in the big data era. publication-title: Data Sci J doi: 10.5334/dsj-2015-002 – volume: 5 issue: 8 year: 2022 ident: zoi231335r9 article-title: Assessment of adherence to reporting guidelines by commonly used clinical prediction models from a single vendor: a systematic review. publication-title: JAMA Netw Open doi: 10.1001/jamanetworkopen.2022.27779 – volume: 15 start-page: 398 issue: 3 year: 2013 ident: zoi231335r36 article-title: Content analysis and thematic analysis: Implications for conducting a qualitative descriptive study. publication-title: Nurs Health Sci doi: 10.1111/nhs.2013.15.issue-3 – volume: 28 start-page: 2247 issue: 11 year: 2022 ident: zoi231335r15 article-title: The AI life cycle: a holistic approach to creating ethical AI for health decisions. publication-title: Nat Med doi: 10.1038/s41591-022-01993-y – year: 2022 ident: zoi231335r53 – ident: zoi231335r11 – volume-title: Constructing Grounded Theory year: 2014 ident: zoi231335r39 – volume: 4 issue: 1 year: 2021 ident: zoi231335r18 article-title: Systemic kidney transplant inequities for Black individuals: examining the contribution of racialized kidney function estimating equations. publication-title: JAMA Netw Open doi: 10.1001/jamanetworkopen.2020.34630 – ident: zoi231335r7 – volume: 295 start-page: 675 issue: 3 year: 2020 ident: zoi231335r4 article-title: Ethics of using and sharing clinical imaging data for artificial intelligence: a proposed framework. publication-title: Radiology doi: 10.1148/radiol.2020192536 – volume: 366 start-page: 447 issue: 6464 year: 2019 ident: zoi231335r16 article-title: Dissecting racial bias in an algorithm used to manage the health of populations. publication-title: Science doi: 10.1126/science.aax2342 – volume: 25 start-page: 44 issue: 1 year: 2019 ident: zoi231335r2 article-title: High-performance medicine: the convergence of human and artificial intelligence. publication-title: Nat Med doi: 10.1038/s41591-018-0300-7 – volume: 42 start-page: 533 issue: 5 year: 2015 ident: zoi231335r32 article-title: Purposeful sampling for qualitative data collection and analysis in mixed method implementation research. publication-title: Adm Policy Ment Health doi: 10.1007/s10488-013-0528-y – year: 2017 ident: zoi231335r25 – ident: zoi231335r20 article-title: Big data quality dimensions: a systematic literature review. publication-title: J Inf Syst Technol Manag doi: 10.4301/S1807-1775 – volume: 29 start-page: 2178 issue: 12 year: 2022 ident: zoi231335r52 article-title: Picture a data scientist: a call to action for increasing diversity, equity, and inclusion in the age of AI. publication-title: J Am Med Inform Assoc doi: 10.1093/jamia/ocac156 – volume: 380 start-page: 1347 issue: 14 year: 2019 ident: zoi231335r1 article-title: Machine learning in medicine. publication-title: N Engl J Med doi: 10.1056/NEJMra1814259 – volume: 326 start-page: 1905 issue: 19 year: 2021 ident: zoi231335r31 article-title: Diagnostic excellence. publication-title: JAMA doi: 10.1001/jama.2021.19493 – volume: 8 start-page: 3351 issue: 3 year: 2017 ident: zoi231335r38 article-title: Doing a thematic analysis: a practical, step-by-step guide for learning and teaching scholars. publication-title: AISHE-J – year: 2019 ident: zoi231335r40 article-title: Grounded theory research: a design framework for novice researchers. publication-title: SAGE Open Med doi: 10.1177/2050312118822927 – volume: 586 start-page: E14 issue: 7829 year: 2020 ident: zoi231335r5 article-title: Transparency and reproducibility in artificial intelligence. publication-title: Nature doi: 10.1038/s41586-020-2766-y – volume: 26 start-page: 60 issue: 1 year: 2015 ident: zoi231335r21 article-title: From data quality to big data quality. publication-title: J Database Manage doi: 10.4018/JDM – volume: 83 start-page: 25 year: 2021 ident: zoi231335r10 article-title: Data preparation for artificial intelligence in medical imaging: a comprehensive guide to open-access platforms and tools. publication-title: Phys Med doi: 10.1016/j.ejmp.2021.02.007 – volume: 12 start-page: 1 issue: 1 year: 2020 ident: zoi231335r29 article-title: Ethical dimensions for data quality. publication-title: J Data and Information Quality doi: 10.1145/3362121 – volume: 3 issue: 1 year: 2016 ident: zoi231335r43 article-title: The FAIR guiding principles for scientific data management and stewardship. publication-title: Sci Data doi: 10.1038/sdata.2016.18 – volume: 15 issue: 2 year: 2020 ident: zoi231335r44 article-title: The views, perspectives, and experiences of academic researchers with data sharing and reuse: A meta-synthesis. publication-title: PLoS One doi: 10.1371/journal.pone.0229182 – volume: 52 start-page: 1893 issue: 4 year: 2018 ident: zoi231335r33 article-title: Saturation in qualitative research: exploring its conceptualization and operationalization. publication-title: Qual Quant doi: 10.1007/s11135-017-0574-8 – volume-title: Applied Thematic Analysis year: 2012 ident: zoi231335r37 doi: 10.4135/9781483384436 – volume: 89 start-page: 548 year: 2018 ident: zoi231335r26 article-title: Context-aware data quality assessment for big data. publication-title: Future Gener Comput Syst doi: 10.1016/j.future.2018.07.014 – volume: 383 start-page: 874 issue: 9 year: 2020 ident: zoi231335r17 article-title: Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. publication-title: N Engl J Med doi: 10.1056/NEJMms2004740 – volume: 5 start-page: e288 issue: 5 year: 2023 ident: zoi231335r45 article-title: The impact of commercial health datasets on medical research and health-care algorithms. publication-title: Lancet Digit Health doi: 10.1016/S2589-7500(23)00025-0 – volume: 8 issue: 1 year: 2021 ident: zoi231335r28 article-title: Big data quality framework: a holistic approach to continuous quality management. publication-title: J Big Data doi: 10.1186/s40537-021-00468-0 – volume: 35 start-page: S96 issue: 2 year: 2008 ident: zoi231335r50 article-title: The ecology of team science: understanding contextual influences on transdisciplinary collaboration. publication-title: Am J Prev Med doi: 10.1016/j.amepre.2008.05.003 – volume: 4 start-page: 123 issue: 1 year: 2021 ident: zoi231335r8 article-title: Ethical machine learning in healthcare. publication-title: Annu Rev Biomed Data Sci doi: 10.1146/biodatasci.2021.4.issue-1 – volume: 73 start-page: 593 issue: 4 year: 2018 ident: zoi231335r51 article-title: The science of teamwork: progress, reflections, and the road ahead. publication-title: Am Psychol doi: 10.1037/amp0000334 – ident: zoi231335r47 doi: 10.1145/3531146.3533239
SSID	ssj0002013965
Score	2.303938
Snippet	The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML)... ImportanceThe lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine... This qualitative study examines the perceptions of data set experts on the present status of development of artificial intelligence (AI)–ready data sets for...
SourceID	pubmedcentral proquest pubmed crossref
SourceType	Open Access Repository Aggregation Database Index Database Enrichment Source
StartPage	e2345892
SubjectTerms	Adult Artificial Intelligence Datasets Delivery of Health Care Ethics Female Health Informatics Humans Machine Learning Male Online Only Original Investigation Qualitative Research
Subtitle	A Qualitative Study
Title	Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning
URI	https://www.ncbi.nlm.nih.gov/pubmed/38039004 https://www.proquest.com/docview/3139252121 https://www.proquest.com/docview/2896809657 https://pubmed.ncbi.nlm.nih.gov/PMC10692863
Volume	6
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fb9MwELa67oUXBOJXYVRG4q1KldixE--tQKeB1jKxVipPkZM4Y9KWTmv2whP_CP8rZztxk25Cg5e0SnpNlPt8Pp_vvkPofVHklIOb7EUsI16Yx9RLFQs8InJKFYEBlep659mcHy_DLyu26vV-t7KWbqt0nP28t67kf7QK50Cvukr2HzTr_hROwHfQLxxBw3B8kI5Pt1kp2un7JCsJg7-y_MV2I-DzlXGwy8psrLepmXUE39YgNXIbk1JvMzhnJslSNfyr57aC3TJuWK7wM0dM2_i2YL9HpU0rH-muXC7QfG6rgnTPnVKOvrcszWajLCvkpXTzw-yiLsKZ5PJqG5o9kzc6U8u0Cq8r42U7ZkHobv7H7mbULhSVMYBgTUKPxj5rW2veBiVpmV5FaMhi21jvzryw7UdQvwb9Fsb60cZOqEvGPf-aHC1PTpLFdLXYQ_sEViGkj_Y_TOen31wQj2gHmjNHZmvZrO6_RdfxubOa2U3KbXk5iyfocb08wROLtaeop8pn6FcLZ3hdYI0XDHjBNc7wusQOZ3gHZ1rA4szJbbDBGQac4RpnuMHZIZ7gFsqwQdlztDyaLj4ee3XnDk9SGlUey32WSRryIi8YTCNZSFMiSB4IWEATrnLli1DKTEUikjQTlMs8LYo4oto_5Rl9gfrlulSvEKZFmCqlwkL54LkLlma-FAGLCHzQKPYHSDSvNclqWnvdXeUysYTcsLztqiTRKkmMSgaIOtlrS-7yIKmDRntJbQw2CQUgEF0HHwzQO3cZTHVmhoaCAZWQWPBYsy1FA_TSKtvdFmBOBZi2AYo7MHA_0DTw3SvlxQ9DBx_4XJCY09d_f6436NF2KB6gfnVzq96CQ12lQ7QXraJhDe6hCUv9AaaV2ME
linkProvider	ProQuest
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Perceptions+of+Data+Set+Experts+on+Important+Characteristics+of+Health+Data+Sets+Ready+for+Machine+Learning%3A+A+Qualitative+Study&rft.jtitle=JAMA+network+open&rft.au=Ng%2C+Madelena+Y&rft.au=Youssef%2C+Alaa&rft.au=Miner%2C+Adam+S&rft.au=Sarellano%2C+Daniela&rft.date=2023-12-01&rft.pub=American+Medical+Association&rft.eissn=2574-3805&rft.volume=6&rft.issue=12&rft.spage=e2345892&rft_id=info:doi/10.1001%2Fjamanetworkopen.2023.45892&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2574-3805&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2574-3805&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2574-3805&client=summon