Biological gene extraction path based on knowledge graph and natural language processing

The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and controlling diseases. At the same time, with the update and iteration of bioinformatics technology, the current biological gene research has...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in genetics Vol. 13; p. 1086379
Main Authors Zhang, Canlin, Cao, Xiaopei
Format Journal Article
LanguageEnglish
Published Switzerland Frontiers Media S.A 13.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and controlling diseases. At the same time, with the update and iteration of bioinformatics technology, the current biological gene research has also undergone revolutionary changes. However, a long-standing problem in genetic research has always plagued researchers, that is, how to find the most needed sample genes from a large number of sample genes, so as to reduce unnecessary research and reduce research costs. By studying the extraction path of biological genes, it can help researchers to extract the most valuable research genes and avoid wasting time and energy. In order to solve the above problems, this paper used the Bhattacharyya distance index and the Gini index to screen the sample genes when extracting the characteristic genes of breast cancer. In the selected 49 public genes, 6 principal components were extracted by principal component analysis (PCA), and finally the experimental results were tested. It was found that when the optimal number of characteristic genes was selected as 5, the recognition rate of genes reached the highest 90.31%, which met the experimental requirements. In addition, the experiment also proved that the characteristic gene extraction method designed in this paper had a removal rate of 99.75% of redundant genes, which can greatly reduce the time and money cost of research.
AbstractList The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and controlling diseases. At the same time, with the update and iteration of bioinformatics technology, the current biological gene research has also undergone revolutionary changes. However, a long-standing problem in genetic research has always plagued researchers, that is, how to find the most needed sample genes from a large number of sample genes, so as to reduce unnecessary research and reduce research costs. By studying the extraction path of biological genes, it can help researchers to extract the most valuable research genes and avoid wasting time and energy. In order to solve the above problems, this paper used the Bhattacharyya distance index and the Gini index to screen the sample genes when extracting the characteristic genes of breast cancer. In the selected 49 public genes, 6 principal components were extracted by principal component analysis (PCA), and finally the experimental results were tested. It was found that when the optimal number of characteristic genes was selected as 5, the recognition rate of genes reached the highest 90.31%, which met the experimental requirements. In addition, the experiment also proved that the characteristic gene extraction method designed in this paper had a removal rate of 99.75% of redundant genes, which can greatly reduce the time and money cost of research.The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and controlling diseases. At the same time, with the update and iteration of bioinformatics technology, the current biological gene research has also undergone revolutionary changes. However, a long-standing problem in genetic research has always plagued researchers, that is, how to find the most needed sample genes from a large number of sample genes, so as to reduce unnecessary research and reduce research costs. By studying the extraction path of biological genes, it can help researchers to extract the most valuable research genes and avoid wasting time and energy. In order to solve the above problems, this paper used the Bhattacharyya distance index and the Gini index to screen the sample genes when extracting the characteristic genes of breast cancer. In the selected 49 public genes, 6 principal components were extracted by principal component analysis (PCA), and finally the experimental results were tested. It was found that when the optimal number of characteristic genes was selected as 5, the recognition rate of genes reached the highest 90.31%, which met the experimental requirements. In addition, the experiment also proved that the characteristic gene extraction method designed in this paper had a removal rate of 99.75% of redundant genes, which can greatly reduce the time and money cost of research.
The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and controlling diseases. At the same time, with the update and iteration of bioinformatics technology, the current biological gene research has also undergone revolutionary changes. However, a long-standing problem in genetic research has always plagued researchers, that is, how to find the most needed sample genes from a large number of sample genes, so as to reduce unnecessary research and reduce research costs. By studying the extraction path of biological genes, it can help researchers to extract the most valuable research genes and avoid wasting time and energy. In order to solve the above problems, this paper used the Bhattacharyya distance index and the Gini index to screen the sample genes when extracting the characteristic genes of breast cancer. In the selected 49 public genes, 6 principal components were extracted by principal component analysis (PCA), and finally the experimental results were tested. It was found that when the optimal number of characteristic genes was selected as 5, the recognition rate of genes reached the highest 90.31%, which met the experimental requirements. In addition, the experiment also proved that the characteristic gene extraction method designed in this paper had a removal rate of 99.75% of redundant genes, which can greatly reduce the time and money cost of research.
Author Zhang, Canlin
Cao, Xiaopei
AuthorAffiliation 1 Sorenson Communications , Salt Lake City , UT , United States
2 College of Creative Culture and Communication , Zhejiang Normal University , Jinhua , Zhejiang , China
AuthorAffiliation_xml – name: 2 College of Creative Culture and Communication , Zhejiang Normal University , Jinhua , Zhejiang , China
– name: 1 Sorenson Communications , Salt Lake City , UT , United States
Author_xml – sequence: 1
  givenname: Canlin
  surname: Zhang
  fullname: Zhang, Canlin
– sequence: 2
  givenname: Xiaopei
  surname: Cao
  fullname: Cao, Xiaopei
BackLink https://www.ncbi.nlm.nih.gov/pubmed/36712855$$D View this record in MEDLINE/PubMed
BookMark eNpVkk1P3DAQhq2KqlDKH-BQ5djLbv2d-FKpRbRFQuqFQ2_WxJlkDVk7tZMW_j1edovAF3_MO89YM-97chRiQELOGV0L0ZjP_YAB15xyvma00aI2b8gJ01quGsrZ0YvzMTnL-ZaWJY0QQr4jx0LXjDdKnZDf33wc4-AdjNWOWOH9nMDNPoZqgnlTtZCxq8rtLsR_I3YDVkOCaVNB6KoA85JK5ghhWKCEphQd5uzD8IG87WHMeHbYT8nN98ubi5-r618_ri6-Xq-cZGZedVwhb3WrG2mo0x0zXHVGtxRpA4oqEKJ8lSnmZA-My4ZTVWI1dOiQojglV3tsF-HWTslvIT3YCN4-PcQ0WEizdyNa6qjSjikjJZd9rQGl6yk4YKCV62VhfdmzpqXdYucwlFaMr6CvI8Fv7BD_WtM0lOq6AD4dACn-WTDPduuzw7G0B-OSLa_rMivJa16kH1_Wei7yfzJFwPcCl2LOCftnCaN25wD75AC7c4A9OEA8ApKWpRY
Cites_doi 10.1016/j.cageo.2017.12.007
10.13057/biodiv/d200820
10.1111/jems.12259
10.1109/mci.2018.2840738
10.1016/j.sjbs.2021.01.036
10.21307/jofnem-2018-055
10.1164/rccm.201610-2006OC
10.1155/2017/5072427
10.1177/1724600820925095
10.3923/jbs.2020.13.21
10.1587/transinf.2017swp0006
10.23919/tst.2017.7889640
10.1007/s00521-020-05101-4
10.3233/sw-160218
10.1002/phar.2151
10.1039/c7sc02701j
10.1007/s11390-017-1718-y
10.3329/dujbs.v26i1.46349
10.1145/3132733
10.1080/09168451.2017.1353401
10.1007/s10579-017-9381-z
ContentType Journal Article
Copyright Copyright © 2023 Zhang and Cao.
Copyright © 2023 Zhang and Cao. 2023 Zhang and Cao
Copyright_xml – notice: Copyright © 2023 Zhang and Cao.
– notice: Copyright © 2023 Zhang and Cao. 2023 Zhang and Cao
DBID AAYXX
CITATION
NPM
7X8
5PM
DOA
DOI 10.3389/fgene.2022.1086379
DatabaseName CrossRef
PubMed
MEDLINE - Academic
PubMed Central (Full Participant titles)
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic


CrossRef
PubMed
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
DocumentTitleAlternate Zhang and Cao
EISSN 1664-8021
ExternalDocumentID oai_doaj_org_article_0c056c1594424f76ae4cf0aca1a65cf4
PMC9880067
36712855
10_3389_fgene_2022_1086379
Genre Journal Article
GroupedDBID 53G
5VS
9T4
AAFWJ
AAKDD
AAYXX
ACGFS
ACXDI
ADBBV
ADRAZ
AFPKN
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BAWUL
BCNDV
CITATION
DIK
EMOBN
GROUPED_DOAJ
GX1
HYE
KQ8
M48
M~E
OK1
PGMZT
RNS
RPM
IPNFZ
NPM
RIG
7X8
5PM
ID FETCH-LOGICAL-c419t-d25e2b6b68490c6d1925d96b0e08a505a33712151c4fa12482050e07adece0e3
IEDL.DBID M48
ISSN 1664-8021
IngestDate Wed Aug 27 01:19:13 EDT 2025
Thu Aug 21 18:38:34 EDT 2025
Fri Jul 11 02:46:20 EDT 2025
Mon Jul 21 05:42:44 EDT 2025
Tue Jul 01 02:19:30 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords biological gene
biological gene extraction
path research
natural language processing
knowledge graph
Language English
License Copyright © 2023 Zhang and Cao.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c419t-d25e2b6b68490c6d1925d96b0e08a505a33712151c4fa12482050e07adece0e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Edited by: Deepak Kumar Jain, Chongqing University of Posts and Telecommunications, China
This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics
Reviewed by: Lei Shi, Luliang University, China
Fenghui Dong, Nanjing Forestry University, China
Tiefeng Wu, Qingdao University of Technology, China
OpenAccessLink http://journals.scholarsportal.info/openUrl.xqy?doi=10.3389/fgene.2022.1086379
PMID 36712855
PQID 2771084272
PQPubID 23479
ParticipantIDs doaj_primary_oai_doaj_org_article_0c056c1594424f76ae4cf0aca1a65cf4
pubmedcentral_primary_oai_pubmedcentral_nih_gov_9880067
proquest_miscellaneous_2771084272
pubmed_primary_36712855
crossref_primary_10_3389_fgene_2022_1086379
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-01-13
PublicationDateYYYYMMDD 2023-01-13
PublicationDate_xml – month: 01
  year: 2023
  text: 2023-01-13
  day: 13
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
PublicationTitle Frontiers in genetics
PublicationTitleAlternate Front Genet
PublicationYear 2023
Publisher Frontiers Media S.A
Publisher_xml – name: Frontiers Media S.A
References Wong (B21) 2018; 38
Paulheim (B15) 2017; 8
Hasan (B8) 2017; 26
Balsmeieri (B3) 2018; 27
Xu (B22) 2020; 35
Zhu (B24) 2017; 2017
Diamantopoulos (B5) 2017; 51
Zhang (B23) 2017; 22
Lin (B11) 2017; 8
UzmaAl-Obeidat (B18) 2022; 34
Wang (B19) 2018; 112
Lin (B12) 2017; 32
Jia (B9) 2017; 12
Shi (B16) 2017; 81
Ebigwai (B7) 2020; 20
Johnny (B10) 2017; 2017
Natthawut (B13) 2018; 101
Nuaima (B14) 2018; 50
Cai (B4) 2019; 25
Tom (B17) 2018; 13
Do (B6) 2020; 21
Abbas (B1) 2019; 20
AlMarshad (B2) 2021; 28
Wi (B20) 2017; 196
References_xml – volume: 21
  start-page: 5344
  year: 2020
  ident: B6
  article-title: Indigenous Lien Minh chicken of Vietnam: Phenotypic characteristics and single nucleotide polymorphisms of GH, IGFBP and PIT candidate genes related to growth traits
  publication-title: Biodiversitas J. Biol. Divers.
– volume: 112
  start-page: 112
  year: 2018
  ident: B19
  article-title: Information extraction and knowledge graph construction from geoscience literature
  publication-title: Comput. Geosciences
  doi: 10.1016/j.cageo.2017.12.007
– volume: 20
  start-page: 2249
  year: 2019
  ident: B1
  article-title: Phylogenetic of sago palm (Metroxylon sagu) and others monocotyledon based on mitochondrial nad2 gene markers
  publication-title: Biodiversitas J. Biol. Divers.
  doi: 10.13057/biodiv/d200820
– volume: 27
  start-page: 535
  year: 2018
  ident: B3
  article-title: Machine learning and natural language processing on the patent corpus: Data, tools, and new measures
  publication-title: J. Econ. Manag. Strategy
  doi: 10.1111/jems.12259
– volume: 25
  start-page: 971
  year: 2019
  ident: B4
  article-title: Dynamic change in the gene expression profile of rat benign prostate hyperplasia tissue after complete denervation
  publication-title: Zhonghua nan ke xue = Natl. J. Androl.
– volume: 13
  start-page: 55
  year: 2018
  ident: B17
  article-title: Recent trends in deep learning based Natural Language Processing
  publication-title: IEEE Comput. Intell. Mag.
  doi: 10.1109/mci.2018.2840738
– volume: 28
  start-page: 2388
  year: 2021
  ident: B2
  article-title: Association of polymorphisms in genes involved in enamel formation, taste preference and immune response with early childhood caries in Saudi pre-school children
  publication-title: Saudi J. Biol. Sci.
  doi: 10.1016/j.sjbs.2021.01.036
– volume: 50
  start-page: 517
  year: 2018
  ident: B14
  article-title: Effector gene vap1 based DGGE fingerprinting to assess variation within and among Heterodera schachtii populations
  publication-title: J. nematology
  doi: 10.21307/jofnem-2018-055
– volume: 196
  start-page: 430
  year: 2017
  ident: B20
  article-title: Application of a Natural Language Processing algorithm to asthma ascertainment: An automated chart review
  publication-title: Am. J. Respir. Crit. Care Med.
  doi: 10.1164/rccm.201610-2006OC
– volume: 2017
  start-page: 1
  year: 2017
  ident: B24
  article-title: Intelligent learning for knowledge graph towards geological data
  publication-title: Sci. Program.
  doi: 10.1155/2017/5072427
– volume: 35
  start-page: 14
  year: 2020
  ident: B22
  article-title: The landscape of gene mutations and clinical significance of tumor mutation burden in patients with soft tissue sarcoma who underwent surgical resection and received conventional adjuvant therapy
  publication-title: Int. J. Biol. Markers
  doi: 10.1177/1724600820925095
– volume: 20
  start-page: 13
  year: 2020
  ident: B7
  article-title: Resolving taxonomic ambiguity between two morphological similar plant taxa using maturase K gene analysis
  publication-title: J. Biol. Sci.
  doi: 10.3923/jbs.2020.13.21
– volume: 101
  start-page: 90
  year: 2018
  ident: B13
  article-title: An automatic knowledge graph creation framework from Natural Language text
  publication-title: Ieice Trans. Inf. Syst.
  doi: 10.1587/transinf.2017swp0006
– volume: 22
  start-page: 185
  year: 2017
  ident: B23
  article-title: Knowledge graph embedding for hyper-relational data
  publication-title: Tsinghua Sci. Technol.
  doi: 10.23919/tst.2017.7889640
– volume: 2017
  start-page: 641
  year: 2017
  ident: B10
  article-title: Detection of suicidality in adolescents with autism spectrum disorders: Developing a Natural Language Processing approach for use in electronic health records
  publication-title: AMIA Symp.
– volume: 34
  start-page: 8309
  year: 2022
  ident: B18
  article-title: Gene encoder: A feature selection technique through unsupervised deep learning-based clustering for large gene expression data
  publication-title: Neural Comput. Applic
  doi: 10.1007/s00521-020-05101-4
– volume: 8
  start-page: 489
  year: 2017
  ident: B15
  article-title: Knowledge graph refinement: A survey of approaches and evaluation methods
  publication-title: Semantic Web
  doi: 10.3233/sw-160218
– volume: 38
  start-page: 822
  year: 2018
  ident: B21
  article-title: Natural Language processing and its implications for the future of medication safety: A narrative review of recent advances and challenges
  publication-title: Pharmacother. J. Hum. Pharmacol. Drug Ther.
  doi: 10.1002/phar.2151
– volume: 8
  start-page: 6670
  year: 2017
  ident: B11
  article-title: Simultaneous visualization of the subfemtomolar expression of microRNA and microRNA target gene using HILO microscopy
  publication-title: Chem. Sci.
  doi: 10.1039/c7sc02701j
– volume: 32
  start-page: 242
  year: 2017
  ident: B12
  article-title: Intelligent development environment and software knowledge graph
  publication-title: J. Comput. Sci. Technol.
  doi: 10.1007/s11390-017-1718-y
– volume: 26
  start-page: 45
  year: 2017
  ident: B8
  article-title: 16S rRNA gene sequence based identification of Vibrio spp. in shrimp and tilapia hatcheries of Bangladesh
  publication-title: Dhaka Univ. J. Biol. Sci.
  doi: 10.3329/dujbs.v26i1.46349
– volume: 12
  start-page: 1
  year: 2017
  ident: B9
  article-title: Knowledge graph embedding: A locally and temporally adaptive translation-based approach
  publication-title: ACM Trans. Web
  doi: 10.1145/3132733
– volume: 81
  start-page: 1721
  year: 2017
  ident: B16
  article-title: Enhanced rutin accumulation in tobacco leaves by overexpressing the NtFLS2 gene
  publication-title: Bioence Biotechnol. Biochem.
  doi: 10.1080/09168451.2017.1353401
– volume: 51
  start-page: 495
  year: 2017
  ident: B5
  article-title: Software requirements as an application domain for natural language processing
  publication-title: Lang. Resour. Eval.
  doi: 10.1007/s10579-017-9381-z
SSID ssj0000493334
Score 2.3021798
Snippet The continuous progress of society and the vigorous development of science and technology have brought people the dawn of maintaining health and preventing and...
SourceID doaj
pubmedcentral
proquest
pubmed
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Index Database
StartPage 1086379
SubjectTerms biological gene
biological gene extraction
Genetics
knowledge graph
natural language processing
path research
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV3NS8MwFA8yELyI39YvIniTsjZfbY4qjiHoacJuIU1T1EM35nbwv_e9pBubCF68tU1LX99r83svTX4_Qm641TUHJEurktWpgAoltWXJU88Vym0XXje4dvj5RQ1fxdNYjtekvnBOWKQHjo7rZw4g2gHoCsFEUyjrhWsy62xulXRNYAIFzFsrpj5i3ss5F3GVDFRhut9APJAWk7EgLsRx7tYaEgXC_t-yzJ-TJdfQZ7BHdru0kd5Fc_fJlm8PyHYUkvw6JOO4hQ6naAGFLncWlyxQ1BymCFY1hb3VGBoNVNXUtjUN5J5w5XLskk7j6gFAtSMyGjyOHoZpp5mQOpHreVoz6VmlKlUKnTlVQwIna62qzGelhWzHcl4goUTuRGMB2yEBkNBW2No7n3l-THrtpPWnhOZBnqrhFVJ2WSZLAC4oP5huKlSs5gm5XbrPTCMzhoGKAp1tgrMNOtt0zk7IPXp4dSayWocDEGvTxdr8FeuEXC_jY-ArwF8btvWTxadhBWRKpWAFS8hJjNfqVlzBE5dSJqTYiOSGLZst7ftbYNrW0LsBnJ_9h_HnZAel6nH4JucXpDefLfwlJDTz6iq8u987mfMd
  priority: 102
  providerName: Directory of Open Access Journals
Title Biological gene extraction path based on knowledge graph and natural language processing
URI https://www.ncbi.nlm.nih.gov/pubmed/36712855
https://www.proquest.com/docview/2771084272
https://pubmed.ncbi.nlm.nih.gov/PMC9880067
https://doaj.org/article/0c056c1594424f76ae4cf0aca1a65cf4
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3PT9swFH6CTpO4IGBjhF_ypN2mQGI7dnJACNAATWInkHqzHMcBpimF0krw3_OenVYrYpddqrSu1eS92t_3nPj7AL4JWzUCkSytS96kEiuU1JalSL1QZLetfdXS3uGrX-ryRv4cFsMlmNkd9QF8ere0Iz-pm_Gfg-fHl2Mc8EdUcSLeHrYYalK85Dz4BgldLcMHRCZNjgZXPd3_HdmwEELGvTP_6LqAT0HG_z3u-fYRyr8w6XwNVnsyyU5i9tdhyXcb8DHaS758gmE8ojQwOgOGE_E4bmRg5ETMCMIahu_mK2ssCFgz2zUsSH5iz9mKJnuIewoQ6z7D9fmP67PLtHdSSJ3Mq0na8MLzWtWqlFXmVIO0rmgqVWc-Ky1yICuEJpmJ3MnWIuIjLSiwTdvGO595sQmDbtT5LWB5MK1qRU1CXpYXJcIZFiW8amvysRYJfJ-FzzxEvQyDdQYF24RgGwq26YOdwClFeP5N0roOH4zGt6YfOiZzSNIc0i4puWy1sl66NrPO5lYVrpUJfJ3lx-DYoBsetvOj6ZPhGvlTKbnmCXyJ-Zr_lFB4xWVRJKAXMrlwLost3f1d0N-ucM5DkN_-7547sEKu9bSSk4tdGEzGU7-H3GZS74c1AXy9GOb74c_7CoKc-9M
linkProvider Scholars Portal
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Biological+gene+extraction+path+based+on+knowledge+graph+and+natural+language+processing&rft.jtitle=Frontiers+in+genetics&rft.au=Zhang%2C+Canlin&rft.au=Cao%2C+Xiaopei&rft.date=2023-01-13&rft.pub=Frontiers+Media+S.A&rft.eissn=1664-8021&rft.volume=13&rft_id=info:doi/10.3389%2Ffgene.2022.1086379&rft.externalDocID=PMC9880067
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1664-8021&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1664-8021&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1664-8021&client=summon