A Machine Learning Classifier to Identify and Prioritise Genes Associated with Cardiac Development

Congenital heart disease (CHD) is a major cause of infant mortality and presents life-long challenges to individuals living with these conditions. Genetic causes are known for only a minority of types of CHD. Discovering further genetic causes is limited by challenges in prioritising candidate CHD g...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Kabir, Mitra, Hartill, Verity, Farr, Gist H, Wasay Mohiuddin Shaikh Qureshi, Baross, Stephanie L, Doig, Andrew J, Talavera, David, Keavney, Bernard D, Maves, Lisa, Johnson, Colin A, Hentges, Kathryn E
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 08.11.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Congenital heart disease (CHD) is a major cause of infant mortality and presents life-long challenges to individuals living with these conditions. Genetic causes are known for only a minority of types of CHD. Discovering further genetic causes is limited by challenges in prioritising candidate CHD genes. We examined a wide range of features of mouse genes, including sequence characteristics, protein localisation and interaction data, developmental expression data and gene ontology annotations. Many features differ between cardiac development and non-cardiac genes, suggesting that these two gene types can be distinguished by their attributes. Therefore, we developed a supervised machine learning (ML) method to identify Mus musculus genes with a high probability of being involved in cardiac development. These genes, when mutated, are candidates for causing human CHD. Our classifier showed a cross-validation accuracy of 81% in detecting cardiac and non-cardiac genes. From our classifier we generated predictions of the cardiac development association status for all protein-coding genes in the mouse genome. We also cross-referenced our predictions with datasets of known human CHD genes, determining which are orthologues of predicted mouse cardiac genes. Our predicted cardiac genes have a high overlap with human CHD genes. Thus, our predictions could inform the prioritisation of genes when evaluating CHD patient sequence data for genetic diagnosis. Knowledge of cardiac developmental genes may speed up reaching a genetic diagnosis for patients born with CHD.Competing Interest StatementThe authors have declared no competing interest.
AbstractList Congenital heart disease (CHD) is a major cause of infant mortality and presents life-long challenges to individuals living with these conditions. Genetic causes are known for only a minority of types of CHD. Discovering further genetic causes is limited by challenges in prioritising candidate CHD genes. We examined a wide range of features of mouse genes, including sequence characteristics, protein localisation and interaction data, developmental expression data and gene ontology annotations. Many features differ between cardiac development and non-cardiac genes, suggesting that these two gene types can be distinguished by their attributes. Therefore, we developed a supervised machine learning (ML) method to identify Mus musculus genes with a high probability of being involved in cardiac development. These genes, when mutated, are candidates for causing human CHD. Our classifier showed a cross-validation accuracy of 81% in detecting cardiac and non-cardiac genes. From our classifier we generated predictions of the cardiac development association status for all protein-coding genes in the mouse genome. We also cross-referenced our predictions with datasets of known human CHD genes, determining which are orthologues of predicted mouse cardiac genes. Our predicted cardiac genes have a high overlap with human CHD genes. Thus, our predictions could inform the prioritisation of genes when evaluating CHD patient sequence data for genetic diagnosis. Knowledge of cardiac developmental genes may speed up reaching a genetic diagnosis for patients born with CHD.Competing Interest StatementThe authors have declared no competing interest.
Author Hartill, Verity
Farr, Gist H
Baross, Stephanie L
Talavera, David
Maves, Lisa
Doig, Andrew J
Hentges, Kathryn E
Johnson, Colin A
Kabir, Mitra
Keavney, Bernard D
Wasay Mohiuddin Shaikh Qureshi
Author_xml – sequence: 1
  givenname: Mitra
  surname: Kabir
  fullname: Kabir, Mitra
– sequence: 2
  givenname: Verity
  surname: Hartill
  fullname: Hartill, Verity
– sequence: 3
  givenname: Gist
  surname: Farr
  middlename: H
  fullname: Farr, Gist H
– sequence: 4
  fullname: Wasay Mohiuddin Shaikh Qureshi
– sequence: 5
  givenname: Stephanie
  surname: Baross
  middlename: L
  fullname: Baross, Stephanie L
– sequence: 6
  givenname: Andrew
  surname: Doig
  middlename: J
  fullname: Doig, Andrew J
– sequence: 7
  givenname: David
  surname: Talavera
  fullname: Talavera, David
– sequence: 8
  givenname: Bernard
  surname: Keavney
  middlename: D
  fullname: Keavney, Bernard D
– sequence: 9
  givenname: Lisa
  surname: Maves
  fullname: Maves, Lisa
– sequence: 10
  givenname: Colin
  surname: Johnson
  middlename: A
  fullname: Johnson, Colin A
– sequence: 11
  givenname: Kathryn
  surname: Hentges
  middlename: E
  fullname: Hentges, Kathryn E
BookMark eNotjj1PwzAURT3AAIUfwPYk5gR_EDseowClUhAM3SvHfqZGwQ52CuLfUwTTvcu5556Tk5giEnLFaM0YZTec8ttjq2lbS84lFWdk7ODJ2H2ICAOaHEN8hX4ypQQfMMOSYOMwLsF_g4kOXnJIOSyhIKwxYoGulGSDWdDBV1j20JvsgrFwh584pfn9yF6QU2-mgpf_uSLbh_tt_1gNz-tN3w3VrJiosKWOjkwxdAYb5RtH23aU0nGppJVes5Hr8fd0g65VgguvLZXGUu-01VasyPXf7JzTxwHLsntLhxyPxp1gXDKllRbiB0FkUk4
ContentType Paper
Copyright 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FH
AAFGM
AAMXL
ABOIG
ABUWG
ADZZV
AFKRA
AFLLJ
AFOLM
AGAJT
AQTIP
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
PIMPY
PQCXX
PQEST
PQQKQ
PQUKI
PRINS
DOI 10.1101/2024.11.08.622603
DatabaseName ProQuest SciTech Collection
ProQuest Natural Science Collection
ProQuest Central Korea - hybrid linking
Natural Science Collection - hybrid linking
Biological Science Collection - hybrid linking
ProQuest Central (Alumni)
ProQuest Central (Alumni) - hybrid linking
ProQuest Central
SciTech Premium Collection - hybrid linking
ProQuest Central Student - hybrid linking
ProQuest Central Essentials - hybrid linking
ProQuest Women's & Gender Studies - hybrid linking
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
ProQuest Natural Science Collection
ProQuest One Community College
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
Biological Sciences
Biological Science Database
Publicly Available Content Database
ProQuest Central - hybrid linking
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
DatabaseTitle Publicly Available Content Database
ProQuest Central Student
ProQuest Biological Science Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
Biological Science Database
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Academic UKI Edition
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest One Academic
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FH
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
DWQXO
GNUQQ
HCIFZ
LK8
M7P
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-p713-e80d0b171edae57f5d088b66d2676c6f91b29b26035ed87323f9c06ac0fd9c9c3
IEDL.DBID BENPR
IngestDate Sat Nov 09 03:29:56 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-p713-e80d0b171edae57f5d088b66d2676c6f91b29b26035ed87323f9c06ac0fd9c9c3
OpenAccessLink https://www.proquest.com/docview/3126179793?pq-origsite=%requestingapplication%
PQID 3126179793
PQPubID 2050091
ParticipantIDs proquest_journals_3126179793
PublicationCentury 2000
PublicationDate 20241108
PublicationDateYYYYMMDD 2024-11-08
PublicationDate_xml – month: 11
  year: 2024
  text: 20241108
  day: 08
PublicationDecade 2020
PublicationPlace Cold Spring Harbor
PublicationPlace_xml – name: Cold Spring Harbor
PublicationTitle bioRxiv
PublicationYear 2024
Publisher Cold Spring Harbor Laboratory Press
Publisher_xml – name: Cold Spring Harbor Laboratory Press
Score 1.787411
Snippet Congenital heart disease (CHD) is a major cause of infant mortality and presents life-long challenges to individuals living with these conditions. Genetic...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Amino acid sequence
Coronary artery disease
Diagnosis
Genes
Genetic screening
Heart diseases
Infant mortality
Learning algorithms
Machine learning
Nucleotide sequence
Predictions
Probability learning
Title A Machine Learning Classifier to Identify and Prioritise Genes Associated with Cardiac Development
URI https://www.proquest.com/docview/3126179793
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NSwMxEA3aXrwpKn5UycFrdHfTZJOTaGkpQkuRCr2VJJNIL93a1oP_3kyaol485bAQyGSY2TeZeY-QOyO4cxAgwlQlWMxQgtm6AmYLE73ZQVxxdng0lsO37stMzHLBbZPbKvcxMQVqaBzWyB94idzhOrrT4-qDoWoUvq5mCY1D0q5KrlSLtJ_748lrfr6M7obgvnu_4-mU8VcjiWP9DbopkwyOSXtiVn59Qg788pTYJzpK3YyeZqLTd5pkKhchpiu6behukDZ80Yj46WS9aBIJkadIF72he-N6oFhQpb10347-agU6I9NBf9obsqx6wFYRMDKvCihsWZcejBd1EBDjgJUSKllLJ4MubaUtHkR4UDWveNCukMYVAbTTjp-T1rJZ-gtCBaJP3E8a3wUJpuROamVqZAsHKy5JZ2-JefbczfzHzlf_f74mR2jbNJenOqS1XX_6m5igt_Y238I3pd2TjA
link.rule.ids 780,784,21388,27925,33744,43805
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NTwIxEG0UDnrTqPEDtQev1f2g3e3JKIGgAiEGE26k7bSGC7sCHvz3dkqJevHUwyZNOp3M7JvOvEfIjeK5MeDAw9SSM5-hONNFBkwnynuzAb_i7PBwJPpv7ecpn8aC2yq2VW5jYgjUUBmskd_lKXKHS-9O9_UHQ9UofF2NEhq7pInM6bxBmo_d0fg1Pl96d0Nw377d8HQK_6sRxLH-Bt2QSXoHpDlWtV0ekh27OCL6gQ5DN6Olkej0nQaZyrnz6YquK7oZpHVf1CN-Ol7Oq0BCZCnSRa_o1rgWKBZUaSfct6G_WoGOyaTXnXT6LKoesNoDRmbLBBKdFqkFZXnhOPg4oIWATBTCCCdTnUmNB-EWyiLPcidNIpRJHEgjTX5CGotqYU8J5Yg-cT-hbBsEqDQ3QpaqQLZw0PyMtLaWmEXPXc1-7Hz-_-drstefDAezwdPo5YLso53DjF7ZIo318tNe-mS91lfxRr4BrOKWdA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Machine+Learning+Classifier+to+Identify+and+Prioritise+Genes+Associated+with+Cardiac+Development&rft.jtitle=bioRxiv&rft.au=Kabir%2C+Mitra&rft.au=Hartill%2C+Verity&rft.au=Farr%2C+Gist+H&rft.au=Wasay+Mohiuddin+Shaikh+Qureshi&rft.date=2024-11-08&rft.pub=Cold+Spring+Harbor+Laboratory+Press&rft_id=info:doi/10.1101%2F2024.11.08.622603