A Machine Learning Classifier to Identify and Prioritise Genes Associated with Cardiac Development
Congenital heart disease (CHD) is a major cause of infant mortality and presents life-long challenges to individuals living with these conditions. Genetic causes are known for only a minority of types of CHD. Discovering further genetic causes is limited by challenges in prioritising candidate CHD g...
Saved in:
Published in | bioRxiv |
---|---|
Main Authors | , , , , , , , , , , |
Format | Paper |
Language | English |
Published |
Cold Spring Harbor
Cold Spring Harbor Laboratory Press
08.11.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Congenital heart disease (CHD) is a major cause of infant mortality and presents life-long challenges to individuals living with these conditions. Genetic causes are known for only a minority of types of CHD. Discovering further genetic causes is limited by challenges in prioritising candidate CHD genes. We examined a wide range of features of mouse genes, including sequence characteristics, protein localisation and interaction data, developmental expression data and gene ontology annotations. Many features differ between cardiac development and non-cardiac genes, suggesting that these two gene types can be distinguished by their attributes. Therefore, we developed a supervised machine learning (ML) method to identify Mus musculus genes with a high probability of being involved in cardiac development. These genes, when mutated, are candidates for causing human CHD. Our classifier showed a cross-validation accuracy of 81% in detecting cardiac and non-cardiac genes. From our classifier we generated predictions of the cardiac development association status for all protein-coding genes in the mouse genome. We also cross-referenced our predictions with datasets of known human CHD genes, determining which are orthologues of predicted mouse cardiac genes. Our predicted cardiac genes have a high overlap with human CHD genes. Thus, our predictions could inform the prioritisation of genes when evaluating CHD patient sequence data for genetic diagnosis. Knowledge of cardiac developmental genes may speed up reaching a genetic diagnosis for patients born with CHD.Competing Interest StatementThe authors have declared no competing interest. |
---|---|
AbstractList | Congenital heart disease (CHD) is a major cause of infant mortality and presents life-long challenges to individuals living with these conditions. Genetic causes are known for only a minority of types of CHD. Discovering further genetic causes is limited by challenges in prioritising candidate CHD genes. We examined a wide range of features of mouse genes, including sequence characteristics, protein localisation and interaction data, developmental expression data and gene ontology annotations. Many features differ between cardiac development and non-cardiac genes, suggesting that these two gene types can be distinguished by their attributes. Therefore, we developed a supervised machine learning (ML) method to identify Mus musculus genes with a high probability of being involved in cardiac development. These genes, when mutated, are candidates for causing human CHD. Our classifier showed a cross-validation accuracy of 81% in detecting cardiac and non-cardiac genes. From our classifier we generated predictions of the cardiac development association status for all protein-coding genes in the mouse genome. We also cross-referenced our predictions with datasets of known human CHD genes, determining which are orthologues of predicted mouse cardiac genes. Our predicted cardiac genes have a high overlap with human CHD genes. Thus, our predictions could inform the prioritisation of genes when evaluating CHD patient sequence data for genetic diagnosis. Knowledge of cardiac developmental genes may speed up reaching a genetic diagnosis for patients born with CHD.Competing Interest StatementThe authors have declared no competing interest. |
Author | Hartill, Verity Farr, Gist H Baross, Stephanie L Talavera, David Maves, Lisa Doig, Andrew J Hentges, Kathryn E Johnson, Colin A Kabir, Mitra Keavney, Bernard D Wasay Mohiuddin Shaikh Qureshi |
Author_xml | – sequence: 1 givenname: Mitra surname: Kabir fullname: Kabir, Mitra – sequence: 2 givenname: Verity surname: Hartill fullname: Hartill, Verity – sequence: 3 givenname: Gist surname: Farr middlename: H fullname: Farr, Gist H – sequence: 4 fullname: Wasay Mohiuddin Shaikh Qureshi – sequence: 5 givenname: Stephanie surname: Baross middlename: L fullname: Baross, Stephanie L – sequence: 6 givenname: Andrew surname: Doig middlename: J fullname: Doig, Andrew J – sequence: 7 givenname: David surname: Talavera fullname: Talavera, David – sequence: 8 givenname: Bernard surname: Keavney middlename: D fullname: Keavney, Bernard D – sequence: 9 givenname: Lisa surname: Maves fullname: Maves, Lisa – sequence: 10 givenname: Colin surname: Johnson middlename: A fullname: Johnson, Colin A – sequence: 11 givenname: Kathryn surname: Hentges middlename: E fullname: Hentges, Kathryn E |
BookMark | eNotjj1PwzAURT3AAIUfwPYk5gR_EDseowClUhAM3SvHfqZGwQ52CuLfUwTTvcu5556Tk5giEnLFaM0YZTec8ttjq2lbS84lFWdk7ODJ2H2ICAOaHEN8hX4ypQQfMMOSYOMwLsF_g4kOXnJIOSyhIKwxYoGulGSDWdDBV1j20JvsgrFwh584pfn9yF6QU2-mgpf_uSLbh_tt_1gNz-tN3w3VrJiosKWOjkwxdAYb5RtH23aU0nGppJVes5Hr8fd0g65VgguvLZXGUu-01VasyPXf7JzTxwHLsntLhxyPxp1gXDKllRbiB0FkUk4 |
ContentType | Paper |
Copyright | 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (“the License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | 8FE 8FH AAFGM AAMXL ABOIG ABUWG ADZZV AFKRA AFLLJ AFOLM AGAJT AQTIP AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M7P PIMPY PQCXX PQEST PQQKQ PQUKI PRINS |
DOI | 10.1101/2024.11.08.622603 |
DatabaseName | ProQuest SciTech Collection ProQuest Natural Science Collection ProQuest Central Korea - hybrid linking Natural Science Collection - hybrid linking Biological Science Collection - hybrid linking ProQuest Central (Alumni) ProQuest Central (Alumni) - hybrid linking ProQuest Central SciTech Premium Collection - hybrid linking ProQuest Central Student - hybrid linking ProQuest Central Essentials - hybrid linking ProQuest Women's & Gender Studies - hybrid linking ProQuest Central Essentials Biological Science Collection ProQuest Central ProQuest Natural Science Collection ProQuest One Community College ProQuest Central Korea ProQuest Central Student SciTech Premium Collection Biological Sciences Biological Science Database Publicly Available Content Database ProQuest Central - hybrid linking ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China |
DatabaseTitle | Publicly Available Content Database ProQuest Central Student ProQuest Biological Science Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection Biological Science Database ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Academic UKI Edition Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest One Academic |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Genre | Working Paper/Pre-Print |
GroupedDBID | 8FE 8FH ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU DWQXO GNUQQ HCIFZ LK8 M7P PIMPY PQEST PQQKQ PQUKI PRINS |
ID | FETCH-LOGICAL-p713-e80d0b171edae57f5d088b66d2676c6f91b29b26035ed87323f9c06ac0fd9c9c3 |
IEDL.DBID | BENPR |
IngestDate | Sat Nov 09 03:29:56 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-p713-e80d0b171edae57f5d088b66d2676c6f91b29b26035ed87323f9c06ac0fd9c9c3 |
OpenAccessLink | https://www.proquest.com/docview/3126179793?pq-origsite=%requestingapplication% |
PQID | 3126179793 |
PQPubID | 2050091 |
ParticipantIDs | proquest_journals_3126179793 |
PublicationCentury | 2000 |
PublicationDate | 20241108 |
PublicationDateYYYYMMDD | 2024-11-08 |
PublicationDate_xml | – month: 11 year: 2024 text: 20241108 day: 08 |
PublicationDecade | 2020 |
PublicationPlace | Cold Spring Harbor |
PublicationPlace_xml | – name: Cold Spring Harbor |
PublicationTitle | bioRxiv |
PublicationYear | 2024 |
Publisher | Cold Spring Harbor Laboratory Press |
Publisher_xml | – name: Cold Spring Harbor Laboratory Press |
Score | 1.787411 |
Snippet | Congenital heart disease (CHD) is a major cause of infant mortality and presents life-long challenges to individuals living with these conditions. Genetic... |
SourceID | proquest |
SourceType | Aggregation Database |
SubjectTerms | Amino acid sequence Coronary artery disease Diagnosis Genes Genetic screening Heart diseases Infant mortality Learning algorithms Machine learning Nucleotide sequence Predictions Probability learning |
Title | A Machine Learning Classifier to Identify and Prioritise Genes Associated with Cardiac Development |
URI | https://www.proquest.com/docview/3126179793 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NSwMxEA3aXrwpKn5UycFrdHfTZJOTaGkpQkuRCr2VJJNIL93a1oP_3kyaol485bAQyGSY2TeZeY-QOyO4cxAgwlQlWMxQgtm6AmYLE73ZQVxxdng0lsO37stMzHLBbZPbKvcxMQVqaBzWyB94idzhOrrT4-qDoWoUvq5mCY1D0q5KrlSLtJ_748lrfr6M7obgvnu_4-mU8VcjiWP9DbopkwyOSXtiVn59Qg788pTYJzpK3YyeZqLTd5pkKhchpiu6behukDZ80Yj46WS9aBIJkadIF72he-N6oFhQpb10347-agU6I9NBf9obsqx6wFYRMDKvCihsWZcejBd1EBDjgJUSKllLJ4MubaUtHkR4UDWveNCukMYVAbTTjp-T1rJZ-gtCBaJP3E8a3wUJpuROamVqZAsHKy5JZ2-JefbczfzHzlf_f74mR2jbNJenOqS1XX_6m5igt_Y238I3pd2TjA |
link.rule.ids | 780,784,21388,27925,33744,43805 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1NTwIxEG0UDnrTqPEDtQev1f2g3e3JKIGgAiEGE26k7bSGC7sCHvz3dkqJevHUwyZNOp3M7JvOvEfIjeK5MeDAw9SSM5-hONNFBkwnynuzAb_i7PBwJPpv7ecpn8aC2yq2VW5jYgjUUBmskd_lKXKHS-9O9_UHQ9UofF2NEhq7pInM6bxBmo_d0fg1Pl96d0Nw377d8HQK_6sRxLH-Bt2QSXoHpDlWtV0ekh27OCL6gQ5DN6Olkej0nQaZyrnz6YquK7oZpHVf1CN-Ol7Oq0BCZCnSRa_o1rgWKBZUaSfct6G_WoGOyaTXnXT6LKoesNoDRmbLBBKdFqkFZXnhOPg4oIWATBTCCCdTnUmNB-EWyiLPcidNIpRJHEgjTX5CGotqYU8J5Yg-cT-hbBsEqDQ3QpaqQLZw0PyMtLaWmEXPXc1-7Hz-_-drstefDAezwdPo5YLso53DjF7ZIo318tNe-mS91lfxRr4BrOKWdA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Machine+Learning+Classifier+to+Identify+and+Prioritise+Genes+Associated+with+Cardiac+Development&rft.jtitle=bioRxiv&rft.au=Kabir%2C+Mitra&rft.au=Hartill%2C+Verity&rft.au=Farr%2C+Gist+H&rft.au=Wasay+Mohiuddin+Shaikh+Qureshi&rft.date=2024-11-08&rft.pub=Cold+Spring+Harbor+Laboratory+Press&rft_id=info:doi/10.1101%2F2024.11.08.622603 |