Biological gene extraction path based on knowledge graph and natural language processing
The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and controlling diseases. At the same time, with the update and iteration of bioinformatics technology, the current biological gene research has...
Saved in:
Published in | Frontiers in genetics Vol. 13; p. 1086379 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Switzerland
Frontiers Media S.A
13.01.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and controlling diseases. At the same time, with the update and iteration of bioinformatics technology, the current biological gene research has also undergone revolutionary changes. However, a long-standing problem in genetic research has always plagued researchers, that is, how to find the most needed sample genes from a large number of sample genes, so as to reduce unnecessary research and reduce research costs. By studying the extraction path of biological genes, it can help researchers to extract the most valuable research genes and avoid wasting time and energy. In order to solve the above problems, this paper used the Bhattacharyya distance index and the Gini index to screen the sample genes when extracting the characteristic genes of breast cancer. In the selected 49 public genes, 6 principal components were extracted by principal component analysis (PCA), and finally the experimental results were tested. It was found that when the optimal number of characteristic genes was selected as 5, the recognition rate of genes reached the highest 90.31%, which met the experimental requirements. In addition, the experiment also proved that the characteristic gene extraction method designed in this paper had a removal rate of 99.75% of redundant genes, which can greatly reduce the time and money cost of research. |
---|---|
AbstractList | The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and controlling diseases. At the same time, with the update and iteration of bioinformatics technology, the current biological gene research has also undergone revolutionary changes. However, a long-standing problem in genetic research has always plagued researchers, that is, how to find the most needed sample genes from a large number of sample genes, so as to reduce unnecessary research and reduce research costs. By studying the extraction path of biological genes, it can help researchers to extract the most valuable research genes and avoid wasting time and energy. In order to solve the above problems, this paper used the Bhattacharyya distance index and the Gini index to screen the sample genes when extracting the characteristic genes of breast cancer. In the selected 49 public genes, 6 principal components were extracted by principal component analysis (PCA), and finally the experimental results were tested. It was found that when the optimal number of characteristic genes was selected as 5, the recognition rate of genes reached the highest 90.31%, which met the experimental requirements. In addition, the experiment also proved that the characteristic gene extraction method designed in this paper had a removal rate of 99.75% of redundant genes, which can greatly reduce the time and money cost of research.The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and controlling diseases. At the same time, with the update and iteration of bioinformatics technology, the current biological gene research has also undergone revolutionary changes. However, a long-standing problem in genetic research has always plagued researchers, that is, how to find the most needed sample genes from a large number of sample genes, so as to reduce unnecessary research and reduce research costs. By studying the extraction path of biological genes, it can help researchers to extract the most valuable research genes and avoid wasting time and energy. In order to solve the above problems, this paper used the Bhattacharyya distance index and the Gini index to screen the sample genes when extracting the characteristic genes of breast cancer. In the selected 49 public genes, 6 principal components were extracted by principal component analysis (PCA), and finally the experimental results were tested. It was found that when the optimal number of characteristic genes was selected as 5, the recognition rate of genes reached the highest 90.31%, which met the experimental requirements. In addition, the experiment also proved that the characteristic gene extraction method designed in this paper had a removal rate of 99.75% of redundant genes, which can greatly reduce the time and money cost of research. The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and controlling diseases. At the same time, with the update and iteration of bioinformatics technology, the current biological gene research has also undergone revolutionary changes. However, a long-standing problem in genetic research has always plagued researchers, that is, how to find the most needed sample genes from a large number of sample genes, so as to reduce unnecessary research and reduce research costs. By studying the extraction path of biological genes, it can help researchers to extract the most valuable research genes and avoid wasting time and energy. In order to solve the above problems, this paper used the Bhattacharyya distance index and the Gini index to screen the sample genes when extracting the characteristic genes of breast cancer. In the selected 49 public genes, 6 principal components were extracted by principal component analysis (PCA), and finally the experimental results were tested. It was found that when the optimal number of characteristic genes was selected as 5, the recognition rate of genes reached the highest 90.31%, which met the experimental requirements. In addition, the experiment also proved that the characteristic gene extraction method designed in this paper had a removal rate of 99.75% of redundant genes, which can greatly reduce the time and money cost of research. |
Author | Zhang, Canlin Cao, Xiaopei |
AuthorAffiliation | 1 Sorenson Communications , Salt Lake City , UT , United States 2 College of Creative Culture and Communication , Zhejiang Normal University , Jinhua , Zhejiang , China |
AuthorAffiliation_xml | – name: 2 College of Creative Culture and Communication , Zhejiang Normal University , Jinhua , Zhejiang , China – name: 1 Sorenson Communications , Salt Lake City , UT , United States |
Author_xml | – sequence: 1 givenname: Canlin surname: Zhang fullname: Zhang, Canlin – sequence: 2 givenname: Xiaopei surname: Cao fullname: Cao, Xiaopei |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/36712855$$D View this record in MEDLINE/PubMed |
BookMark | eNpVkk1P3DAQhq2KqlDKH-BQ5djLbv2d-FKpRbRFQuqFQ2_WxJlkDVk7tZMW_j1edovAF3_MO89YM-97chRiQELOGV0L0ZjP_YAB15xyvma00aI2b8gJ01quGsrZ0YvzMTnL-ZaWJY0QQr4jx0LXjDdKnZDf33wc4-AdjNWOWOH9nMDNPoZqgnlTtZCxq8rtLsR_I3YDVkOCaVNB6KoA85JK5ghhWKCEphQd5uzD8IG87WHMeHbYT8nN98ubi5-r618_ri6-Xq-cZGZedVwhb3WrG2mo0x0zXHVGtxRpA4oqEKJ8lSnmZA-My4ZTVWI1dOiQojglV3tsF-HWTslvIT3YCN4-PcQ0WEizdyNa6qjSjikjJZd9rQGl6yk4YKCV62VhfdmzpqXdYucwlFaMr6CvI8Fv7BD_WtM0lOq6AD4dACn-WTDPduuzw7G0B-OSLa_rMivJa16kH1_Wei7yfzJFwPcCl2LOCftnCaN25wD75AC7c4A9OEA8ApKWpRY |
Cites_doi | 10.1016/j.cageo.2017.12.007 10.13057/biodiv/d200820 10.1111/jems.12259 10.1109/mci.2018.2840738 10.1016/j.sjbs.2021.01.036 10.21307/jofnem-2018-055 10.1164/rccm.201610-2006OC 10.1155/2017/5072427 10.1177/1724600820925095 10.3923/jbs.2020.13.21 10.1587/transinf.2017swp0006 10.23919/tst.2017.7889640 10.1007/s00521-020-05101-4 10.3233/sw-160218 10.1002/phar.2151 10.1039/c7sc02701j 10.1007/s11390-017-1718-y 10.3329/dujbs.v26i1.46349 10.1145/3132733 10.1080/09168451.2017.1353401 10.1007/s10579-017-9381-z |
ContentType | Journal Article |
Copyright | Copyright © 2023 Zhang and Cao. Copyright © 2023 Zhang and Cao. 2023 Zhang and Cao |
Copyright_xml | – notice: Copyright © 2023 Zhang and Cao. – notice: Copyright © 2023 Zhang and Cao. 2023 Zhang and Cao |
DBID | AAYXX CITATION NPM 7X8 5PM DOA |
DOI | 10.3389/fgene.2022.1086379 |
DatabaseName | CrossRef PubMed MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef PubMed MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic CrossRef PubMed |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
DocumentTitleAlternate | Zhang and Cao |
EISSN | 1664-8021 |
ExternalDocumentID | oai_doaj_org_article_0c056c1594424f76ae4cf0aca1a65cf4 PMC9880067 36712855 10_3389_fgene_2022_1086379 |
Genre | Journal Article |
GroupedDBID | 53G 5VS 9T4 AAFWJ AAKDD AAYXX ACGFS ACXDI ADBBV ADRAZ AFPKN ALMA_UNASSIGNED_HOLDINGS AOIJS BAWUL BCNDV CITATION DIK EMOBN GROUPED_DOAJ GX1 HYE KQ8 M48 M~E OK1 PGMZT RNS RPM IPNFZ NPM RIG 7X8 5PM |
ID | FETCH-LOGICAL-c419t-d25e2b6b68490c6d1925d96b0e08a505a33712151c4fa12482050e07adece0e3 |
IEDL.DBID | M48 |
ISSN | 1664-8021 |
IngestDate | Wed Aug 27 01:19:13 EDT 2025 Thu Aug 21 18:38:34 EDT 2025 Fri Jul 11 02:46:20 EDT 2025 Mon Jul 21 05:42:44 EDT 2025 Tue Jul 01 02:19:30 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | biological gene biological gene extraction path research natural language processing knowledge graph |
Language | English |
License | Copyright © 2023 Zhang and Cao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c419t-d25e2b6b68490c6d1925d96b0e08a505a33712151c4fa12482050e07adece0e3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Edited by: Deepak Kumar Jain, Chongqing University of Posts and Telecommunications, China This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics Reviewed by: Lei Shi, Luliang University, China Fenghui Dong, Nanjing Forestry University, China Tiefeng Wu, Qingdao University of Technology, China |
OpenAccessLink | http://journals.scholarsportal.info/openUrl.xqy?doi=10.3389/fgene.2022.1086379 |
PMID | 36712855 |
PQID | 2771084272 |
PQPubID | 23479 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_0c056c1594424f76ae4cf0aca1a65cf4 pubmedcentral_primary_oai_pubmedcentral_nih_gov_9880067 proquest_miscellaneous_2771084272 pubmed_primary_36712855 crossref_primary_10_3389_fgene_2022_1086379 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2023-01-13 |
PublicationDateYYYYMMDD | 2023-01-13 |
PublicationDate_xml | – month: 01 year: 2023 text: 2023-01-13 day: 13 |
PublicationDecade | 2020 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland |
PublicationTitle | Frontiers in genetics |
PublicationTitleAlternate | Front Genet |
PublicationYear | 2023 |
Publisher | Frontiers Media S.A |
Publisher_xml | – name: Frontiers Media S.A |
References | Wong (B21) 2018; 38 Paulheim (B15) 2017; 8 Hasan (B8) 2017; 26 Balsmeieri (B3) 2018; 27 Xu (B22) 2020; 35 Zhu (B24) 2017; 2017 Diamantopoulos (B5) 2017; 51 Zhang (B23) 2017; 22 Lin (B11) 2017; 8 UzmaAl-Obeidat (B18) 2022; 34 Wang (B19) 2018; 112 Lin (B12) 2017; 32 Jia (B9) 2017; 12 Shi (B16) 2017; 81 Ebigwai (B7) 2020; 20 Johnny (B10) 2017; 2017 Natthawut (B13) 2018; 101 Nuaima (B14) 2018; 50 Cai (B4) 2019; 25 Tom (B17) 2018; 13 Do (B6) 2020; 21 Abbas (B1) 2019; 20 AlMarshad (B2) 2021; 28 Wi (B20) 2017; 196 |
References_xml | – volume: 21 start-page: 5344 year: 2020 ident: B6 article-title: Indigenous Lien Minh chicken of Vietnam: Phenotypic characteristics and single nucleotide polymorphisms of GH, IGFBP and PIT candidate genes related to growth traits publication-title: Biodiversitas J. Biol. Divers. – volume: 112 start-page: 112 year: 2018 ident: B19 article-title: Information extraction and knowledge graph construction from geoscience literature publication-title: Comput. Geosciences doi: 10.1016/j.cageo.2017.12.007 – volume: 20 start-page: 2249 year: 2019 ident: B1 article-title: Phylogenetic of sago palm (Metroxylon sagu) and others monocotyledon based on mitochondrial nad2 gene markers publication-title: Biodiversitas J. Biol. Divers. doi: 10.13057/biodiv/d200820 – volume: 27 start-page: 535 year: 2018 ident: B3 article-title: Machine learning and natural language processing on the patent corpus: Data, tools, and new measures publication-title: J. Econ. Manag. Strategy doi: 10.1111/jems.12259 – volume: 25 start-page: 971 year: 2019 ident: B4 article-title: Dynamic change in the gene expression profile of rat benign prostate hyperplasia tissue after complete denervation publication-title: Zhonghua nan ke xue = Natl. J. Androl. – volume: 13 start-page: 55 year: 2018 ident: B17 article-title: Recent trends in deep learning based Natural Language Processing publication-title: IEEE Comput. Intell. Mag. doi: 10.1109/mci.2018.2840738 – volume: 28 start-page: 2388 year: 2021 ident: B2 article-title: Association of polymorphisms in genes involved in enamel formation, taste preference and immune response with early childhood caries in Saudi pre-school children publication-title: Saudi J. Biol. Sci. doi: 10.1016/j.sjbs.2021.01.036 – volume: 50 start-page: 517 year: 2018 ident: B14 article-title: Effector gene vap1 based DGGE fingerprinting to assess variation within and among Heterodera schachtii populations publication-title: J. nematology doi: 10.21307/jofnem-2018-055 – volume: 196 start-page: 430 year: 2017 ident: B20 article-title: Application of a Natural Language Processing algorithm to asthma ascertainment: An automated chart review publication-title: Am. J. Respir. Crit. Care Med. doi: 10.1164/rccm.201610-2006OC – volume: 2017 start-page: 1 year: 2017 ident: B24 article-title: Intelligent learning for knowledge graph towards geological data publication-title: Sci. Program. doi: 10.1155/2017/5072427 – volume: 35 start-page: 14 year: 2020 ident: B22 article-title: The landscape of gene mutations and clinical significance of tumor mutation burden in patients with soft tissue sarcoma who underwent surgical resection and received conventional adjuvant therapy publication-title: Int. J. Biol. Markers doi: 10.1177/1724600820925095 – volume: 20 start-page: 13 year: 2020 ident: B7 article-title: Resolving taxonomic ambiguity between two morphological similar plant taxa using maturase K gene analysis publication-title: J. Biol. Sci. doi: 10.3923/jbs.2020.13.21 – volume: 101 start-page: 90 year: 2018 ident: B13 article-title: An automatic knowledge graph creation framework from Natural Language text publication-title: Ieice Trans. Inf. Syst. doi: 10.1587/transinf.2017swp0006 – volume: 22 start-page: 185 year: 2017 ident: B23 article-title: Knowledge graph embedding for hyper-relational data publication-title: Tsinghua Sci. Technol. doi: 10.23919/tst.2017.7889640 – volume: 2017 start-page: 641 year: 2017 ident: B10 article-title: Detection of suicidality in adolescents with autism spectrum disorders: Developing a Natural Language Processing approach for use in electronic health records publication-title: AMIA Symp. – volume: 34 start-page: 8309 year: 2022 ident: B18 article-title: Gene encoder: A feature selection technique through unsupervised deep learning-based clustering for large gene expression data publication-title: Neural Comput. Applic doi: 10.1007/s00521-020-05101-4 – volume: 8 start-page: 489 year: 2017 ident: B15 article-title: Knowledge graph refinement: A survey of approaches and evaluation methods publication-title: Semantic Web doi: 10.3233/sw-160218 – volume: 38 start-page: 822 year: 2018 ident: B21 article-title: Natural Language processing and its implications for the future of medication safety: A narrative review of recent advances and challenges publication-title: Pharmacother. J. Hum. Pharmacol. Drug Ther. doi: 10.1002/phar.2151 – volume: 8 start-page: 6670 year: 2017 ident: B11 article-title: Simultaneous visualization of the subfemtomolar expression of microRNA and microRNA target gene using HILO microscopy publication-title: Chem. Sci. doi: 10.1039/c7sc02701j – volume: 32 start-page: 242 year: 2017 ident: B12 article-title: Intelligent development environment and software knowledge graph publication-title: J. Comput. Sci. Technol. doi: 10.1007/s11390-017-1718-y – volume: 26 start-page: 45 year: 2017 ident: B8 article-title: 16S rRNA gene sequence based identification of Vibrio spp. in shrimp and tilapia hatcheries of Bangladesh publication-title: Dhaka Univ. J. Biol. Sci. doi: 10.3329/dujbs.v26i1.46349 – volume: 12 start-page: 1 year: 2017 ident: B9 article-title: Knowledge graph embedding: A locally and temporally adaptive translation-based approach publication-title: ACM Trans. Web doi: 10.1145/3132733 – volume: 81 start-page: 1721 year: 2017 ident: B16 article-title: Enhanced rutin accumulation in tobacco leaves by overexpressing the NtFLS2 gene publication-title: Bioence Biotechnol. Biochem. doi: 10.1080/09168451.2017.1353401 – volume: 51 start-page: 495 year: 2017 ident: B5 article-title: Software requirements as an application domain for natural language processing publication-title: Lang. Resour. Eval. doi: 10.1007/s10579-017-9381-z |
SSID | ssj0000493334 |
Score | 2.3021798 |
Snippet | The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and... |
SourceID | doaj pubmedcentral proquest pubmed crossref |
SourceType | Open Website Open Access Repository Aggregation Database Index Database |
StartPage | 1086379 |
SubjectTerms | biological gene biological gene extraction Genetics knowledge graph natural language processing path research |
SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3NS8MwFA8yELyI39YvIniTsjZfbY4qjiHoacJuIU1T1EM35nbwv_e9pBubCF68tU1LX99r83svTX4_Qm641TUHJEurktWpgAoltWXJU88Vym0XXje4dvj5RQ1fxdNYjtekvnBOWKQHjo7rZw4g2gHoCsFEUyjrhWsy62xulXRNYAIFzFsrpj5i3ss5F3GVDFRhut9APJAWk7EgLsRx7tYaEgXC_t-yzJ-TJdfQZ7BHdru0kd5Fc_fJlm8PyHYUkvw6JOO4hQ6naAGFLncWlyxQ1BymCFY1hb3VGBoNVNXUtjUN5J5w5XLskk7j6gFAtSMyGjyOHoZpp5mQOpHreVoz6VmlKlUKnTlVQwIna62qzGelhWzHcl4goUTuRGMB2yEBkNBW2No7n3l-THrtpPWnhOZBnqrhFVJ2WSZLAC4oP5huKlSs5gm5XbrPTCMzhoGKAp1tgrMNOtt0zk7IPXp4dSayWocDEGvTxdr8FeuEXC_jY-ArwF8btvWTxadhBWRKpWAFS8hJjNfqVlzBE5dSJqTYiOSGLZst7ftbYNrW0LsBnJ_9h_HnZAel6nH4JucXpDefLfwlJDTz6iq8u987mfMd priority: 102 providerName: Directory of Open Access Journals |
Title | Biological gene extraction path based on knowledge graph and natural language processing |
URI | https://www.ncbi.nlm.nih.gov/pubmed/36712855 https://www.proquest.com/docview/2771084272 https://pubmed.ncbi.nlm.nih.gov/PMC9880067 https://doaj.org/article/0c056c1594424f76ae4cf0aca1a65cf4 |
Volume | 13 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PT9swFH6CTpO4IGBjhF_ypN2mQGI7dnJACNAATWInkHqzHMcBpimF0krw3_OenVYrYpddqrSu1eS92t_3nPj7AL4JWzUCkSytS96kEiuU1JalSL1QZLetfdXS3uGrX-ryRv4cFsMlmNkd9QF8ere0Iz-pm_Gfg-fHl2Mc8EdUcSLeHrYYalK85Dz4BgldLcMHRCZNjgZXPd3_HdmwEELGvTP_6LqAT0HG_z3u-fYRyr8w6XwNVnsyyU5i9tdhyXcb8DHaS758gmE8ojQwOgOGE_E4bmRg5ETMCMIahu_mK2ssCFgz2zUsSH5iz9mKJnuIewoQ6z7D9fmP67PLtHdSSJ3Mq0na8MLzWtWqlFXmVIO0rmgqVWc-Ky1yICuEJpmJ3MnWIuIjLSiwTdvGO595sQmDbtT5LWB5MK1qRU1CXpYXJcIZFiW8amvysRYJfJ-FzzxEvQyDdQYF24RgGwq26YOdwClFeP5N0roOH4zGt6YfOiZzSNIc0i4puWy1sl66NrPO5lYVrpUJfJ3lx-DYoBsetvOj6ZPhGvlTKbnmCXyJ-Zr_lFB4xWVRJKAXMrlwLost3f1d0N-ucM5DkN_-7547sEKu9bSSk4tdGEzGU7-H3GZS74c1AXy9GOb74c_7CoKc-9M |
linkProvider | Scholars Portal |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Biological+gene+extraction+path+based+on+knowledge+graph+and+natural+language+processing&rft.jtitle=Frontiers+in+genetics&rft.au=Zhang%2C+Canlin&rft.au=Cao%2C+Xiaopei&rft.date=2023-01-13&rft.pub=Frontiers+Media+S.A&rft.eissn=1664-8021&rft.volume=13&rft_id=info:doi/10.3389%2Ffgene.2022.1086379&rft.externalDocID=PMC9880067 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1664-8021&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1664-8021&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1664-8021&client=summon |