Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning A Qualitative Study

The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. To discern what constit...

Full description

Saved in:
Bibliographic Details
Published inJAMA network open Vol. 6; no. 12; p. e2345892
Main Authors Ng, Madelena Y., Youssef, Alaa, Miner, Adam S., Sarellano, Daniela, Long, Jin, Larson, David B., Hernandez-Boussard, Tina, Langlotz, Curtis P.
Format Journal Article
LanguageEnglish
Published United States American Medical Association 01.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. Data set experts' perceptions on what makes data sets AI ready. Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.
AbstractList ImportanceThe lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care.ObjectiveTo discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts.Design, Setting, and ParticipantsThis qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data.Main Outcomes and MeasuresData set experts’ perceptions on what makes data sets AI ready.ResultsParticipants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness.Conclusions and RelevanceIn this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.
The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care.ImportanceThe lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care.To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts.ObjectiveTo discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts.This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data.Design, Setting, and ParticipantsThis qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data.Data set experts' perceptions on what makes data sets AI ready.Main Outcomes and MeasuresData set experts' perceptions on what makes data sets AI ready.Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness.ResultsParticipants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness.In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.Conclusions and RelevanceIn this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.
This qualitative study examines the perceptions of data set experts on the present status of development of artificial intelligence (AI)–ready data sets for use in machine learning research.
The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML) research in health care and hinders the diagnostic excellence of developed clinical AI applications for patient care. To discern what constitutes high-quality and useful data sets for health and biomedical ML research purposes according to subject matter experts. This qualitative study interviewed data set experts, particularly those who are creators and ML researchers. Semistructured interviews were conducted in English and remotely through a secure video conferencing platform between August 23, 2022, and January 5, 2023. A total of 93 experts were invited to participate. Twenty experts were enrolled and interviewed. Using purposive sampling, experts were affiliated with a diverse representation of 16 health data sets/databases across organizational sectors. Content analysis was used to evaluate survey information and thematic analysis was used to analyze interview data. Data set experts' perceptions on what makes data sets AI ready. Participants included 20 data set experts (11 [55%] men; mean [SD] age, 42 [11] years), of whom all were health data set creators, and 18 of the 20 were also ML researchers. Themes (3 main and 11 subthemes) were identified and integrated into an AI-readiness framework to show their association within the health data ecosystem. Participants partially determined the AI readiness of data sets using priority appraisal elements of accuracy, completeness, consistency, and fitness. Ethical acquisition and societal impact emerged as appraisal considerations in that participant samples have not been described to date in prior data quality frameworks. Factors that drive creation of high-quality health data sets and mitigate risks associated with data reuse in ML research were also relevant to AI readiness. The state of data availability, data quality standards, documentation, team science, and incentivization were associated with elements of AI readiness and the overall perception of data set usefulness. In this qualitative study of data set experts, participants contributed to the development of a grounded framework for AI data set quality. Data set AI readiness required the concerted appraisal of many elements and the balancing of transparency and ethical reflection against pragmatic constraints. The movement toward more reliable, relevant, and ethical AI and ML applications for patient care will inevitably require strategic updates to data set creation practices.
Author Youssef, Alaa
Long, Jin
Ng, Madelena Y.
Hernandez-Boussard, Tina
Sarellano, Daniela
Langlotz, Curtis P.
Larson, David B.
Miner, Adam S.
AuthorAffiliation 5 Department of Pediatrics, Stanford University School of Medicine, Stanford, California
3 Department of Radiology, Stanford University School of Medicine, Stanford, California
4 Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California
2 Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California
1 Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Stanford, California
AuthorAffiliation_xml – name: 2 Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California
– name: 4 Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California
– name: 3 Department of Radiology, Stanford University School of Medicine, Stanford, California
– name: 5 Department of Pediatrics, Stanford University School of Medicine, Stanford, California
– name: 1 Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Stanford, California
Author_xml – sequence: 1
  givenname: Madelena Y.
  surname: Ng
  fullname: Ng, Madelena Y.
  organization: Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Stanford, California, Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California
– sequence: 2
  givenname: Alaa
  surname: Youssef
  fullname: Youssef, Alaa
  organization: Department of Radiology, Stanford University School of Medicine, Stanford, California
– sequence: 3
  givenname: Adam S.
  surname: Miner
  fullname: Miner, Adam S.
  organization: Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, California
– sequence: 4
  givenname: Daniela
  surname: Sarellano
  fullname: Sarellano, Daniela
  organization: Department of Radiology, Stanford University School of Medicine, Stanford, California
– sequence: 5
  givenname: Jin
  surname: Long
  fullname: Long, Jin
  organization: Department of Pediatrics, Stanford University School of Medicine, Stanford, California
– sequence: 6
  givenname: David B.
  surname: Larson
  fullname: Larson, David B.
  organization: Department of Radiology, Stanford University School of Medicine, Stanford, California
– sequence: 7
  givenname: Tina
  surname: Hernandez-Boussard
  fullname: Hernandez-Boussard, Tina
  organization: Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Stanford, California, Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California
– sequence: 8
  givenname: Curtis P.
  surname: Langlotz
  fullname: Langlotz, Curtis P.
  organization: Department of Medicine (Biomedical Informatics), Stanford University School of Medicine, Stanford, California, Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, California, Department of Radiology, Stanford University School of Medicine, Stanford, California
BackLink https://www.ncbi.nlm.nih.gov/pubmed/38039004$$D View this record in MEDLINE/PubMed
BookMark eNqNUcluFDEUtFAispBfQBZcuMzES2_mQtAkJJEmArGcrTfu1xlPuu3G9gD5ezzZFObEydZzVb1y1QHZcd4hIW84m3LG-PEKBnCYfvtw40d0U8GEnBZlo8QLsi_KupjIhpU7z-575CjGFWNMMC5VVb4ke3ksFWPFPrn5gsHgmKx3kfqOnkIC-g0TPfszYkh55ujlMPqQwCU6W0IAkzDYmKy5I1wg9Gn5xIv0K0J7Szsf6BWYpXVI5wjBWXf9iux20Ec8ejgPyY9PZ99nF5P55_PL2cf5BKSs06RsWWlAFlXXdiWTzBRyIZRouWoqISpskakCwGCtapBGyQraRdc1teSK8crIQ_LhXndcLwZsDboUoNdjsAOEW-3B6n9fnF3qa_9Lc1Yp0VQyK7x7UAj-5xpj0oONBvs-R-_XUYtGVQ3LWdYZ-nYLuvLr4PL_tMxxi1JwwTPq9XNLT14ei8iAk3uACT7GgJ02NsGmluzQ9tma3tSvt-rXm_r1Xf1Z4v2WxOOW_yD_BdWkvO8
CitedBy_id crossref_primary_10_1016_j_imavis_2024_105068
crossref_primary_10_1016_j_outlook_2024_102343
crossref_primary_10_1080_10408398_2025_2461237
crossref_primary_10_1097_ALN_0000000000004998
crossref_primary_10_1055_a_2415_8408
crossref_primary_10_3389_fphar_2023_1276149
crossref_primary_10_1371_journal_pdig_0000474
crossref_primary_10_1093_database_baae083
crossref_primary_10_1002_cai2_136
crossref_primary_10_1016_j_comtox_2024_100316
crossref_primary_10_7759_cureus_78068
Cites_doi 10.1109/TR.2021.3070863
10.1001/jama.2020.12067
10.1177/1049732305276687
10.1093/jamia/ocaa210
10.1136/bmj.m1328
10.5688/ajpe7113
10.3390/jcm11082265
10.1080/07421222.1996.11518099
10.1186/2047-2501-2-4
10.1093/jamia/ocaa088
10.5334/dsj-2015-002
10.1001/jamanetworkopen.2022.27779
10.1111/nhs.2013.15.issue-3
10.1038/s41591-022-01993-y
10.1001/jamanetworkopen.2020.34630
10.1148/radiol.2020192536
10.1126/science.aax2342
10.1038/s41591-018-0300-7
10.1007/s10488-013-0528-y
10.4301/S1807-1775
10.1093/jamia/ocac156
10.1056/NEJMra1814259
10.1001/jama.2021.19493
10.1177/2050312118822927
10.1038/s41586-020-2766-y
10.4018/JDM
10.1016/j.ejmp.2021.02.007
10.1145/3362121
10.1038/sdata.2016.18
10.1371/journal.pone.0229182
10.1007/s11135-017-0574-8
10.4135/9781483384436
10.1016/j.future.2018.07.014
10.1056/NEJMms2004740
10.1016/S2589-7500(23)00025-0
10.1186/s40537-021-00468-0
10.1016/j.amepre.2008.05.003
10.1146/biodatasci.2021.4.issue-1
10.1037/amp0000334
10.1145/3531146.3533239
ContentType Journal Article
Copyright 2023. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright 2023 Ng MY et al. .
Copyright_xml – notice: 2023. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: Copyright 2023 Ng MY et al. .
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
K9.
7X8
5PM
DOI 10.1001/jamanetworkopen.2023.45892
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
ProQuest Health & Medical Complete (Alumni)
MEDLINE - Academic
PubMed Central (Full Participant titles)
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
ProQuest Health & Medical Complete (Alumni)
MEDLINE - Academic
DatabaseTitleList ProQuest Health & Medical Complete (Alumni)
MEDLINE - Academic

MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
DocumentTitleAlternate Expert Perceptions of Characteristics of High-Quality AI-Ready Health Data Sets
EISSN 2574-3805
ExternalDocumentID PMC10692863
38039004
10_1001_jamanetworkopen_2023_45892
Genre Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NLM NIH HHS
  grantid: T15 LM007033
GroupedDBID 0R~
53G
7X7
8FI
8FJ
AAYXX
ABUWG
ADBBV
ADPDF
AFKRA
ALIPV
ALMA_UNASSIGNED_HOLDINGS
AMJDE
BCNDV
BENPR
CCPQU
CITATION
EBS
EMOBN
FYUFA
GROUPED_DOAJ
H13
HMCUK
M~E
OK1
OVD
OVEED
PHGZM
PHGZT
PIMPY
RAJ
TEORI
UKHRP
W2D
CGR
CUY
CVF
ECM
EIF
NPM
K9.
7X8
5PM
ID FETCH-LOGICAL-a337t-5d05ca346fdf5030c43b292d1986226ede094aace797a3c936adbff87319016c3
ISSN 2574-3805
IngestDate Thu Aug 21 18:35:45 EDT 2025
Fri Jul 11 08:11:17 EDT 2025
Mon Jun 30 14:10:38 EDT 2025
Thu Apr 03 07:07:15 EDT 2025
Thu Apr 24 22:55:47 EDT 2025
Tue Jul 01 02:17:45 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
License This is an open access article distributed under the terms of the CC-BY License.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-a337t-5d05ca346fdf5030c43b292d1986226ede094aace797a3c936adbff87319016c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
OpenAccessLink http://dx.doi.org/10.1001/jamanetworkopen.2023.45892
PMID 38039004
PQID 3139252121
PQPubID 5319538
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_10692863
proquest_miscellaneous_2896809657
proquest_journals_3139252121
pubmed_primary_38039004
crossref_citationtrail_10_1001_jamanetworkopen_2023_45892
crossref_primary_10_1001_jamanetworkopen_2023_45892
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-12-01
PublicationDateYYYYMMDD 2023-12-01
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-12-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Chicago
PublicationTitle JAMA network open
PublicationTitleAlternate JAMA Netw Open
PublicationYear 2023
Publisher American Medical Association
Publisher_xml – name: American Medical Association
References Kleinheksel (zoi231335r35) 2020; 84
Perrier (zoi231335r44) 2020; 15
Salas (zoi231335r51) 2018; 73
Wang (zoi231335r19) 1996; 12
Larson (zoi231335r4) 2020; 295
Palinkas (zoi231335r32) 2015; 42
Wade (zoi231335r42) 2014; 2
Lu (zoi231335r9) 2022; 5
Röösli (zoi231335r14) 2021; 28
Cai (zoi231335r23)
de Hond (zoi231335r52) 2022; 29
zoi231335r47
Topol (zoi231335r2) 2019; 25
Wynants (zoi231335r13) 2020; 369
Haibe-Kains (zoi231335r5) 2020; 586
zoi231335r46
zoi231335r41
Chen (zoi231335r8) 2021; 4
Hsieh (zoi231335r34) 2005; 15
Becker (zoi231335r22) 2015
Vaismoradi (zoi231335r36) 2013; 15
zoi231335r7
Chun Tie (zoi231335r40) 2019
zoi231335r6
Firmani (zoi231335r29) 2020; 12
Serhani (zoi231335r24) 2016
Stokols (zoi231335r50) 2008; 35
Saunders (zoi231335r33) 2018; 52
Catarci (zoi231335r25) 2017
Yang (zoi231335r31) 2021; 326
Gordon (zoi231335r53) 2022
Wilkinson (zoi231335r43) 2016; 3
El Alaoui (zoi231335r27) 2019
Alberto (zoi231335r45) 2023; 5
Ng (zoi231335r15) 2022; 28
Charmaz (zoi231335r39) 2014
Chen (zoi231335r30) 2021; 70
Holland (zoi231335r48) 2020
Ramasamy (zoi231335r20)
Diaz (zoi231335r10) 2021; 83
Kaushal (zoi231335r12) 2020; 324
Batini (zoi231335r21) 2015; 26
Vyas (zoi231335r17) 2020; 383
Taleb (zoi231335r28) 2021; 8
Guest (zoi231335r37) 2012
Boulware (zoi231335r18) 2021; 4
Rajkomar (zoi231335r1) 2019; 380
Obermeyer (zoi231335r16) 2019; 366
Maguire (zoi231335r38) 2017; 8
Busnatu (zoi231335r3) 2022; 11
zoi231335r11
Hernandez-Boussard (zoi231335r49) 2020; 27
Ardagna (zoi231335r26) 2018; 89
References_xml – volume: 70
  start-page: 831
  issue: 2
  year: 2021
  ident: zoi231335r30
  article-title: Data evaluation and enhancement for quality improvement of machine learning.
  publication-title: IEEE Trans Reliab
  doi: 10.1109/TR.2021.3070863
– ident: zoi231335r41
– volume: 324
  start-page: 1212
  issue: 12
  year: 2020
  ident: zoi231335r12
  article-title: Geographic distribution of US cohorts used to train deep learning algorithms.
  publication-title: JAMA
  doi: 10.1001/jama.2020.12067
– volume: 15
  start-page: 1277
  issue: 9
  year: 2005
  ident: zoi231335r34
  article-title: Three approaches to qualitative content analysis.
  publication-title: Qual Health Res
  doi: 10.1177/1049732305276687
– year: 2015
  ident: zoi231335r22
– ident: zoi231335r6
– volume: 28
  start-page: 190
  issue: 1
  year: 2021
  ident: zoi231335r14
  article-title: Bias at warp speed: how AI may contribute to the disparities gap in the time of COVID-19.
  publication-title: J Am Med Inform Assoc
  doi: 10.1093/jamia/ocaa210
– volume: 369
  start-page: m1328
  year: 2020
  ident: zoi231335r13
  article-title: Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal.
  publication-title: BMJ
  doi: 10.1136/bmj.m1328
– volume: 84
  start-page: 7113
  issue: 1
  year: 2020
  ident: zoi231335r35
  article-title: Demystifying content analysis.
  publication-title: Am J Pharm Educ
  doi: 10.5688/ajpe7113
– volume: 11
  start-page: 2265
  issue: 8
  year: 2022
  ident: zoi231335r3
  article-title: Clinical applications of artificial intelligence—an updated overview.
  publication-title: J Clin Med
  doi: 10.3390/jcm11082265
– year: 2016
  ident: zoi231335r24
– volume: 12
  start-page: 5
  issue: 4
  year: 1996
  ident: zoi231335r19
  article-title: Beyond accuracy: what data quality means to data consumers.
  publication-title: J Manage Inf Syst
  doi: 10.1080/07421222.1996.11518099
– volume: 2
  start-page: 4
  issue: 1
  year: 2014
  ident: zoi231335r42
  article-title: Traits and types of health data repositories.
  publication-title: Health Inf Sci Syst
  doi: 10.1186/2047-2501-2-4
– volume-title: Data Protection and Privacy: Data Protection and Democracy
  year: 2020
  ident: zoi231335r48
– year: 2019
  ident: zoi231335r27
– ident: zoi231335r46
– volume: 27
  start-page: 2011
  issue: 12
  year: 2020
  ident: zoi231335r49
  article-title: MINIMAR (minimum information for medical AI reporting): developing reporting standards for artificial intelligence in health care.
  publication-title: J Am Med Inform Assoc
  doi: 10.1093/jamia/ocaa088
– ident: zoi231335r23
  article-title: The challenges of data quality and data quality assessment in the big data era.
  publication-title: Data Sci J
  doi: 10.5334/dsj-2015-002
– volume: 5
  issue: 8
  year: 2022
  ident: zoi231335r9
  article-title: Assessment of adherence to reporting guidelines by commonly used clinical prediction models from a single vendor: a systematic review.
  publication-title: JAMA Netw Open
  doi: 10.1001/jamanetworkopen.2022.27779
– volume: 15
  start-page: 398
  issue: 3
  year: 2013
  ident: zoi231335r36
  article-title: Content analysis and thematic analysis: Implications for conducting a qualitative descriptive study.
  publication-title: Nurs Health Sci
  doi: 10.1111/nhs.2013.15.issue-3
– volume: 28
  start-page: 2247
  issue: 11
  year: 2022
  ident: zoi231335r15
  article-title: The AI life cycle: a holistic approach to creating ethical AI for health decisions.
  publication-title: Nat Med
  doi: 10.1038/s41591-022-01993-y
– year: 2022
  ident: zoi231335r53
– ident: zoi231335r11
– volume-title: Constructing Grounded Theory
  year: 2014
  ident: zoi231335r39
– volume: 4
  issue: 1
  year: 2021
  ident: zoi231335r18
  article-title: Systemic kidney transplant inequities for Black individuals: examining the contribution of racialized kidney function estimating equations.
  publication-title: JAMA Netw Open
  doi: 10.1001/jamanetworkopen.2020.34630
– ident: zoi231335r7
– volume: 295
  start-page: 675
  issue: 3
  year: 2020
  ident: zoi231335r4
  article-title: Ethics of using and sharing clinical imaging data for artificial intelligence: a proposed framework.
  publication-title: Radiology
  doi: 10.1148/radiol.2020192536
– volume: 366
  start-page: 447
  issue: 6464
  year: 2019
  ident: zoi231335r16
  article-title: Dissecting racial bias in an algorithm used to manage the health of populations.
  publication-title: Science
  doi: 10.1126/science.aax2342
– volume: 25
  start-page: 44
  issue: 1
  year: 2019
  ident: zoi231335r2
  article-title: High-performance medicine: the convergence of human and artificial intelligence.
  publication-title: Nat Med
  doi: 10.1038/s41591-018-0300-7
– volume: 42
  start-page: 533
  issue: 5
  year: 2015
  ident: zoi231335r32
  article-title: Purposeful sampling for qualitative data collection and analysis in mixed method implementation research.
  publication-title: Adm Policy Ment Health
  doi: 10.1007/s10488-013-0528-y
– year: 2017
  ident: zoi231335r25
– ident: zoi231335r20
  article-title: Big data quality dimensions: a systematic literature review.
  publication-title: J Inf Syst Technol Manag
  doi: 10.4301/S1807-1775
– volume: 29
  start-page: 2178
  issue: 12
  year: 2022
  ident: zoi231335r52
  article-title: Picture a data scientist: a call to action for increasing diversity, equity, and inclusion in the age of AI.
  publication-title: J Am Med Inform Assoc
  doi: 10.1093/jamia/ocac156
– volume: 380
  start-page: 1347
  issue: 14
  year: 2019
  ident: zoi231335r1
  article-title: Machine learning in medicine.
  publication-title: N Engl J Med
  doi: 10.1056/NEJMra1814259
– volume: 326
  start-page: 1905
  issue: 19
  year: 2021
  ident: zoi231335r31
  article-title: Diagnostic excellence.
  publication-title: JAMA
  doi: 10.1001/jama.2021.19493
– volume: 8
  start-page: 3351
  issue: 3
  year: 2017
  ident: zoi231335r38
  article-title: Doing a thematic analysis: a practical, step-by-step guide for learning and teaching scholars.
  publication-title: AISHE-J
– year: 2019
  ident: zoi231335r40
  article-title: Grounded theory research: a design framework for novice researchers.
  publication-title: SAGE Open Med
  doi: 10.1177/2050312118822927
– volume: 586
  start-page: E14
  issue: 7829
  year: 2020
  ident: zoi231335r5
  article-title: Transparency and reproducibility in artificial intelligence.
  publication-title: Nature
  doi: 10.1038/s41586-020-2766-y
– volume: 26
  start-page: 60
  issue: 1
  year: 2015
  ident: zoi231335r21
  article-title: From data quality to big data quality.
  publication-title: J Database Manage
  doi: 10.4018/JDM
– volume: 83
  start-page: 25
  year: 2021
  ident: zoi231335r10
  article-title: Data preparation for artificial intelligence in medical imaging: a comprehensive guide to open-access platforms and tools.
  publication-title: Phys Med
  doi: 10.1016/j.ejmp.2021.02.007
– volume: 12
  start-page: 1
  issue: 1
  year: 2020
  ident: zoi231335r29
  article-title: Ethical dimensions for data quality.
  publication-title: J Data and Information Quality
  doi: 10.1145/3362121
– volume: 3
  issue: 1
  year: 2016
  ident: zoi231335r43
  article-title: The FAIR guiding principles for scientific data management and stewardship.
  publication-title: Sci Data
  doi: 10.1038/sdata.2016.18
– volume: 15
  issue: 2
  year: 2020
  ident: zoi231335r44
  article-title: The views, perspectives, and experiences of academic researchers with data sharing and reuse: A meta-synthesis.
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0229182
– volume: 52
  start-page: 1893
  issue: 4
  year: 2018
  ident: zoi231335r33
  article-title: Saturation in qualitative research: exploring its conceptualization and operationalization.
  publication-title: Qual Quant
  doi: 10.1007/s11135-017-0574-8
– volume-title: Applied Thematic Analysis
  year: 2012
  ident: zoi231335r37
  doi: 10.4135/9781483384436
– volume: 89
  start-page: 548
  year: 2018
  ident: zoi231335r26
  article-title: Context-aware data quality assessment for big data.
  publication-title: Future Gener Comput Syst
  doi: 10.1016/j.future.2018.07.014
– volume: 383
  start-page: 874
  issue: 9
  year: 2020
  ident: zoi231335r17
  article-title: Hidden in plain sight—reconsidering the use of race correction in clinical algorithms.
  publication-title: N Engl J Med
  doi: 10.1056/NEJMms2004740
– volume: 5
  start-page: e288
  issue: 5
  year: 2023
  ident: zoi231335r45
  article-title: The impact of commercial health datasets on medical research and health-care algorithms.
  publication-title: Lancet Digit Health
  doi: 10.1016/S2589-7500(23)00025-0
– volume: 8
  issue: 1
  year: 2021
  ident: zoi231335r28
  article-title: Big data quality framework: a holistic approach to continuous quality management.
  publication-title: J Big Data
  doi: 10.1186/s40537-021-00468-0
– volume: 35
  start-page: S96
  issue: 2
  year: 2008
  ident: zoi231335r50
  article-title: The ecology of team science: understanding contextual influences on transdisciplinary collaboration.
  publication-title: Am J Prev Med
  doi: 10.1016/j.amepre.2008.05.003
– volume: 4
  start-page: 123
  issue: 1
  year: 2021
  ident: zoi231335r8
  article-title: Ethical machine learning in healthcare.
  publication-title: Annu Rev Biomed Data Sci
  doi: 10.1146/biodatasci.2021.4.issue-1
– volume: 73
  start-page: 593
  issue: 4
  year: 2018
  ident: zoi231335r51
  article-title: The science of teamwork: progress, reflections, and the road ahead.
  publication-title: Am Psychol
  doi: 10.1037/amp0000334
– ident: zoi231335r47
  doi: 10.1145/3531146.3533239
SSID ssj0002013965
Score 2.303938
Snippet The lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine learning (ML)...
ImportanceThe lack of data quality frameworks to guide the development of artificial intelligence (AI)-ready data sets limits their usefulness for machine...
This qualitative study examines the perceptions of data set experts on the present status of development of artificial intelligence (AI)–ready data sets for...
SourceID pubmedcentral
proquest
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage e2345892
SubjectTerms Adult
Artificial Intelligence
Datasets
Delivery of Health Care
Ethics
Female
Health Informatics
Humans
Machine Learning
Male
Online Only
Original Investigation
Qualitative Research
Subtitle A Qualitative Study
Title Perceptions of Data Set Experts on Important Characteristics of Health Data Sets Ready for Machine Learning
URI https://www.ncbi.nlm.nih.gov/pubmed/38039004
https://www.proquest.com/docview/3139252121
https://www.proquest.com/docview/2896809657
https://pubmed.ncbi.nlm.nih.gov/PMC10692863
Volume 6
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fb9MwELa67oUXBOJXYVRG4q1KldixE--tQKeB1jKxVipPkZM4Y9KWTmv2whP_CP8rZztxk25Cg5e0SnpNlPt8Pp_vvkPofVHklIOb7EUsI16Yx9RLFQs8InJKFYEBlep659mcHy_DLyu26vV-t7KWbqt0nP28t67kf7QK50Cvukr2HzTr_hROwHfQLxxBw3B8kI5Pt1kp2un7JCsJg7-y_MV2I-DzlXGwy8psrLepmXUE39YgNXIbk1JvMzhnJslSNfyr57aC3TJuWK7wM0dM2_i2YL9HpU0rH-muXC7QfG6rgnTPnVKOvrcszWajLCvkpXTzw-yiLsKZ5PJqG5o9kzc6U8u0Cq8r42U7ZkHobv7H7mbULhSVMYBgTUKPxj5rW2veBiVpmV5FaMhi21jvzryw7UdQvwb9Fsb60cZOqEvGPf-aHC1PTpLFdLXYQ_sEViGkj_Y_TOen31wQj2gHmjNHZmvZrO6_RdfxubOa2U3KbXk5iyfocb08wROLtaeop8pn6FcLZ3hdYI0XDHjBNc7wusQOZ3gHZ1rA4szJbbDBGQac4RpnuMHZIZ7gFsqwQdlztDyaLj4ee3XnDk9SGlUey32WSRryIi8YTCNZSFMiSB4IWEATrnLli1DKTEUikjQTlMs8LYo4oto_5Rl9gfrlulSvEKZFmCqlwkL54LkLlma-FAGLCHzQKPYHSDSvNclqWnvdXeUysYTcsLztqiTRKkmMSgaIOtlrS-7yIKmDRntJbQw2CQUgEF0HHwzQO3cZTHVmhoaCAZWQWPBYsy1FA_TSKtvdFmBOBZi2AYo7MHA_0DTw3SvlxQ9DBx_4XJCY09d_f6436NF2KB6gfnVzq96CQ12lQ7QXraJhDe6hCUv9AaaV2ME
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Perceptions+of+Data+Set+Experts+on+Important+Characteristics+of+Health+Data+Sets+Ready+for+Machine+Learning%3A+A+Qualitative+Study&rft.jtitle=JAMA+network+open&rft.au=Ng%2C+Madelena+Y&rft.au=Youssef%2C+Alaa&rft.au=Miner%2C+Adam+S&rft.au=Sarellano%2C+Daniela&rft.date=2023-12-01&rft.pub=American+Medical+Association&rft.eissn=2574-3805&rft.volume=6&rft.issue=12&rft.spage=e2345892&rft_id=info:doi/10.1001%2Fjamanetworkopen.2023.45892&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2574-3805&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2574-3805&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2574-3805&client=summon