Machine learning methods for metabolic pathway prediction

A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 11; no. 1; p. 15
Main Authors	Dale, Joseph M, Popescu, Liviu, Karp, Peter D
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 08.01.2010 BioMed Central BMC
Subjects	Artificial Intelligence Computational Biology - methods Databases, Factual DNA sequencing Genetic algorithms Genome Machine learning Metabolic Networks and Pathways Nucleotide sequencing Software United States
Online Access	Get full text

Cover

Loading…

Abstract	A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.
AbstractList	Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations. A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations. A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations. BACKGROUNDA key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. RESULTSTo quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. CONCLUSIONSML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations. Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations. Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism. Results To quantitatively validate methods for pathway prediction, we developed a large "gold standard" dataset of 5,610 pathway instances known to be present or absent in curated metabolic pathway databases for six organisms. We defined a collection of 123 pathway features, whose information content we evaluated with respect to the gold standard. Feature data were used as input to an extensive collection of machine learning (ML) methods, including naïve Bayes, decision trees, and logistic regression, together with feature selection and ensemble methods. We compared the ML methods to the previous PathoLogic algorithm for pathway prediction using the gold standard dataset. We found that ML-based prediction methods can match the performance of the PathoLogic algorithm. PathoLogic achieved an accuracy of 91% and an F-measure of 0.786. The ML-based prediction methods achieved accuracy as high as 91.2% and F-measure as high as 0.787. The ML-based methods output a probability for each predicted pathway, whereas PathoLogic does not, which provides more information to the user and facilitates filtering of predicted pathways. Conclusions ML methods for pathway prediction perform as well as existing methods, and have qualitative advantages in terms of extensibility, tunability, and explainability. More advanced prediction methods and/or more sophisticated input features may improve the performance of ML methods. However, pathway prediction performance appears to be limited largely by the ability to correctly match enzymes to the reactions they catalyze based on genome annotations.
ArticleNumber	15
Audience	Academic
Author	Popescu, Liviu Dale, Joseph M Karp, Peter D
AuthorAffiliation	1 Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, CA, 94025, USA
AuthorAffiliation_xml	– name: 1 Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, CA, 94025, USA
Author_xml	– sequence: 1 givenname: Joseph M surname: Dale fullname: Dale, Joseph M organization: Bioinformatics Research Group, SRI International, 333 Ravenswood Ave, Menlo Park, CA 94025, USA – sequence: 2 givenname: Liviu surname: Popescu fullname: Popescu, Liviu – sequence: 3 givenname: Peter D surname: Karp fullname: Karp, Peter D
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/20064214$$D View this record in MEDLINE/PubMed
BookMark	eNp1kstv1DAQhyNURB9w5oYicUAc0nps53VBKhWPlYqQeJytiT3Jukrixc4C_e9xSFk1UpEPHs385vOMZ06To9GNlCTPgZ0DVMUFyBIyDizPADLIHyUnB8_RPfs4OQ3hhjEoK5Y_SY45Y4XkIE-S-hPqrR0p7Qn9aMcuHWjaOhPS1vnZxsb1Vqc7nLa_8DbdeTJWT9aNT5PHLfaBnt3dZ8n39---XX3Mrj9_2FxdXmdNUckpy3OQmomCG84IhQZNjZZ5g9rwEkAKURYYXZywxlq0RVkIWeWSgGQZE8RZslm4xuGN2nk7oL9VDq3663C-U-gnq3tSkkxeVbpmNTeyRULDOFJbmyavRSEpst4srN2-GchoGieP_Qq6jox2qzr3UwmQBSt5BLxdAI11_wGsI9oNah6DmsegABTkEfLqrgrvfuwpTGqwQVPf40huH1QpRHyvEiwqXy7KDmN7dmxdhOpZrS451LJilZyLOn9AFY-hweq4MK2N_lXC61VC1Ez0e-pwH4LafP2y1l4sWu1dCJ7aQ7PA1LyDD7T34v4nH_T_lk78AcPl10M
CitedBy_id	crossref_primary_10_1021_acssynbio_0c00129 crossref_primary_10_1016_j_biotechadv_2016_03_001 crossref_primary_10_1186_s12934_019_1132_y crossref_primary_10_1093_bioinformatics_btv578 crossref_primary_10_1186_s12859_024_05666_0 crossref_primary_10_1016_j_scitotenv_2020_137894 crossref_primary_10_1038_s42003_019_0440_4 crossref_primary_10_1002_cjoc_202100273 crossref_primary_10_1155_2014_371397 crossref_primary_10_1093_bib_bbs070 crossref_primary_10_1016_j_copbio_2019_11_007 crossref_primary_10_1007_s00253_014_5987_x crossref_primary_10_1371_journal_pone_0024495 crossref_primary_10_1016_j_csbj_2023_03_045 crossref_primary_10_3390_metabo8010004 crossref_primary_10_1089_cmb_2021_0258 crossref_primary_10_18632_oncotarget_9132 crossref_primary_10_1016_j_biosystems_2018_09_003 crossref_primary_10_1016_j_copbio_2012_03_009 crossref_primary_10_1002_biot_201800416 crossref_primary_10_1093_bib_bbt031 crossref_primary_10_1021_acs_jctc_7b00993 crossref_primary_10_18632_oncotarget_20537 crossref_primary_10_3390_pr10112226 crossref_primary_10_1002_cben_201500024 crossref_primary_10_1038_ismej_2015_54 crossref_primary_10_1093_nar_gkr1014 crossref_primary_10_1371_journal_pone_0025297 crossref_primary_10_1016_j_tibtech_2010_07_002 crossref_primary_10_1016_j_csbj_2022_04_016 crossref_primary_10_1371_journal_pcbi_1004838 crossref_primary_10_1016_j_mec_2022_e00209 crossref_primary_10_1016_j_jmb_2012_10_014 crossref_primary_10_1093_bioinformatics_btz954 crossref_primary_10_3390_ijms20102476 crossref_primary_10_1186_1471_2105_12_176 crossref_primary_10_3389_fchem_2018_00199 crossref_primary_10_1016_j_jbiotec_2014_03_029 crossref_primary_10_1016_j_tibtech_2019_07_009 crossref_primary_10_1016_j_jtbi_2023_111684 crossref_primary_10_1155_2014_845479 crossref_primary_10_1007_s12257_014_0172_8 crossref_primary_10_1080_10409238_2017_1290043 crossref_primary_10_1039_C7MO00051K crossref_primary_10_1016_j_meteno_2016_03_003 crossref_primary_10_1007_s00204_011_0705_2 crossref_primary_10_1038_s41598_017_17842_9 crossref_primary_10_1186_1752_0509_5_122 crossref_primary_10_1002_aps3_11376 crossref_primary_10_1021_acssynbio_8b00049 crossref_primary_10_1016_j_bej_2021_108054 crossref_primary_10_4056_sigs_1794338 crossref_primary_10_1093_bioinformatics_btr428 crossref_primary_10_1093_jxb_err371 crossref_primary_10_1089_cmb_2011_0193 crossref_primary_10_1186_1480_9222_16_8 crossref_primary_10_1016_j_ymben_2014_07_009 crossref_primary_10_1371_journal_pcbi_1003126 crossref_primary_10_1016_j_copbio_2014_02_011 crossref_primary_10_1186_1471_2105_14_202 crossref_primary_10_3390_ijms21165686 crossref_primary_10_1155_2014_891945 crossref_primary_10_1016_j_biotechadv_2018_04_008 crossref_primary_10_1016_j_biotechadv_2020_107631 crossref_primary_10_3390_horticulturae9030389 crossref_primary_10_1038_sdata_2017_35 crossref_primary_10_1093_bib_bbz104 crossref_primary_10_1021_acs_chemrev_2c00403 crossref_primary_10_3389_fmicb_2017_00534 crossref_primary_10_1016_j_cbpa_2015_06_025 crossref_primary_10_1093_bioinformatics_btr681 crossref_primary_10_1371_journal_pone_0158896 crossref_primary_10_1111_1574_6968_12194 crossref_primary_10_1186_1471_2105_12_141 crossref_primary_10_1016_j_csbj_2020_10_011 crossref_primary_10_1093_femsec_fiy068 crossref_primary_10_1016_j_cels_2016_04_017 crossref_primary_10_1186_1471_2164_15_619 crossref_primary_10_1128_AEM_01487_15 crossref_primary_10_1021_ci500517v crossref_primary_10_1016_j_biotechadv_2021_107858 crossref_primary_10_1093_bib_bbaa136 crossref_primary_10_1093_nar_gkv1164 crossref_primary_10_1016_j_scitotenv_2020_144561 crossref_primary_10_3389_fnins_2018_00670 crossref_primary_10_3389_fbioe_2021_666858 crossref_primary_10_1186_1752_0509_6_35 crossref_primary_10_1021_acs_jcim_9b00689 crossref_primary_10_1093_database_bay035 crossref_primary_10_1186_1471_2105_14_114 crossref_primary_10_1093_bib_bbad120 crossref_primary_10_3390_ijms22062903 crossref_primary_10_1186_1471_2105_14_112 crossref_primary_10_1080_07388551_2023_2237183 crossref_primary_10_34133_2022_9898461 crossref_primary_10_1093_bioinformatics_btaa906 crossref_primary_10_1021_acssynbio_1c00189 crossref_primary_10_1002_jctb_5319 crossref_primary_10_1093_nar_gkt1103
Cites_doi	10.1093/bioinformatics/18.suppl_1.S225 10.1093/nar/gkn863 10.1186/1471-2105-5-112 10.1023/A:1010933404324 10.1093/bioinformatics/btm409 10.1109/CIBCB.2005.1594924 10.1093/bioinformatics/btn302 10.1104/pp.105.060376 10.1093/nar/gkl228 10.1109/TAC.1974.1100705 10.1093/nar/gkn751 10.1093/bioinformatics/18.5.715 10.1007/BF01889584 10.1093/nar/gki285 10.1093/nar/gkn282 10.1093/bioinformatics/bti1052 10.1038/msb4100155 10.1093/bioinformatics/bti1012 10.1186/1471-2105-5-76 10.1186/1471-2105-8-139 10.1038/nbt1094-994 10.1093/nar/gkl438 10.1093/bioinformatics/btg217 10.1186/1752-0509-3-33 10.1214/aos/1176344136 10.1093/nar/gki866 10.1186/gb-2009-10-3-r28 10.1093/nar/gkm900
ContentType	Journal Article
Copyright	COPYRIGHT 2010 BioMed Central Ltd. Copyright ©2010 Dale et al; licensee BioMed Central Ltd. 2010 Dale et al; licensee BioMed Central Ltd.
Copyright_xml	– notice: COPYRIGHT 2010 BioMed Central Ltd. – notice: Copyright ©2010 Dale et al; licensee BioMed Central Ltd. 2010 Dale et al; licensee BioMed Central Ltd.
DBID	CGR CUY CVF ECM EIF NPM AAYXX CITATION ISR 7X8 5PM DOA
DOI	10.1186/1471-2105-11-15
DatabaseName	Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed CrossRef Gale In Context: Science MEDLINE - Academic PubMed Central (Full Participant titles) DOAJ Directory of Open Access Journals
DatabaseTitle	MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) CrossRef MEDLINE - Academic
DatabaseTitleList	MEDLINE MEDLINE - Academic CrossRef
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 3 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1471-2105
EndPage	15
ExternalDocumentID	oai_doaj_org_article_4ed588c9092d4faead02aef9db59364e oai_biomedcentral_com_1471_2105_11_15 A219480842 10_1186_1471_2105_11_15 20064214
Genre	Journal Article Research Support, N.I.H., Extramural
GeographicLocations	United States
GeographicLocations_xml	– name: United States
GrantInformation_xml	– fundername: NLM NIH HHS grantid: R01 LM009651 – fundername: NLM NIH HHS grantid: R01 LM009651-02 – fundername: NLM NIH HHS grantid: LM009651
GroupedDBID	--- -A0 0R~ 23N 2VQ 2WC 3V. 4.4 53G 5VS 6J9 7X7 88E 8AO 8FE 8FG 8FH 8FI 8FJ AAFWJ AAJSJ AAKPC ABDBF ABUWG ACGFO ACGFS ACIHN ACIWK ACPRK ACRMQ ADBBV ADINQ ADRAZ ADUKV AEAQA AENEX AFGXO AFKRA AFNRJ AFPKN AFRAH AHBYD AHMBA AHSBF AHYZX ALMA_UNASSIGNED_HOLDINGS AMKLP AMTXH AOIJS ARAPS AZQEC BAPOH BAWUL BBNVY BCNDV BENPR BFQNJ BGLVJ BHPHI BMC BPHCQ BVXVI C1A C24 C6C CCPQU CGR CS3 CUY CVF DIK DU5 DWQXO E3Z EAD EAP EAS EBD EBS ECM EIF EJD EMB EMK EMOBN ESX F5P FYUFA GNUQQ GROUPED_DOAJ GX1 HCIFZ HMCUK HYE IAO IHR INH INR IPNFZ ISR ITC K6V K7- KQ8 LK8 M0N M1P M48 M7P MK~ ML0 M~E NPM O5R O5S OK1 P2P P62 PIMPY PQQKQ PROAC PSQYO RBZ RIG RNS ROL RPM RSV SBL SOJ SV3 TR2 TUS UKHRP W2D WOQ WOW XH6 XSB AAYXX ALIPV CITATION EBLON H13 PGMZT 7X8 ABVAZ 5PM
ID	FETCH-LOGICAL-b684t-5514c0362d20ea3c1cebc45bacd271143376aebc2ea9a93f67634854e1e47ea33
IEDL.DBID	RBZ
ISSN	1471-2105
IngestDate	Mon Sep 09 06:04:11 EDT 2024 Tue Sep 17 21:22:41 EDT 2024 Wed May 22 07:12:23 EDT 2024 Fri Aug 16 09:24:37 EDT 2024 Wed Jul 24 18:22:37 EDT 2024 Tue Jul 23 04:32:42 EDT 2024 Wed Sep 25 21:14:08 EDT 2024 Thu Sep 12 19:56:42 EDT 2024 Thu May 23 23:13:49 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
License	This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-b684t-5514c0362d20ea3c1cebc45bacd271143376aebc2ea9a93f67634854e1e47ea33
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
OpenAccessLink	http://dx.doi.org/10.1186/1471-2105-11-15
PMID	20064214
PQID	733314830
PQPubID	23479
ParticipantIDs	doaj_primary_oai_doaj_org_article_4ed588c9092d4faead02aef9db59364e pubmedcentral_primary_oai_pubmedcentral_nih_gov_3146072 biomedcentral_primary_oai_biomedcentral_com_1471_2105_11_15 proquest_miscellaneous_733314830 gale_infotracmisc_A219480842 gale_infotracacademiconefile_A219480842 gale_incontextgauss_ISR_A219480842 crossref_primary_10_1186_1471_2105_11_15 pubmed_primary_20064214
PublicationCentury	2000
PublicationDate	2010-01-08
PublicationDateYYYYMMDD	2010-01-08
PublicationDate_xml	– month: 01 year: 2010 text: 2010-01-08 day: 08
PublicationDecade	2010
PublicationPlace	England
PublicationPlace_xml	– name: England
PublicationTitle	BMC bioinformatics
PublicationTitleAlternate	BMC Bioinformatics
PublicationYear	2010
Publisher	BioMed Central Ltd BioMed Central BMC
Publisher_xml	– name: BioMed Central Ltd – name: BioMed Central – name: BMC
References	18689840 - Bioinformatics. 2008 Aug 15;24(16):i56-62 16893953 - Nucleic Acids Res. 2006;34(13):3687-97 12169551 - Bioinformatics. 2002;18 Suppl 1:S225-32 18974181 - Nucleic Acids Res. 2009 Jan;37(Database issue):D464-70 19284550 - Genome Biol. 2009;10(3):R28 19284618 - BMC Syst Biol. 2009;3:33 17766269 - Bioinformatics. 2007 Oct 15;23(20):2775-83 18477636 - Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W423-6 16845105 - Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W714-9 12050068 - Bioinformatics. 2002 May;18(5):715-24 15189570 - BMC Bioinformatics. 2004 Jun 9;5:76 17593909 - Mol Syst Biol. 2007;3:121 15961494 - Bioinformatics. 2005 Jun;21 Suppl 1:i478-86 16214803 - Nucleic Acids Res. 2005;33(17):5691-702 15745999 - Nucleic Acids Res. 2005;33(4):1399-409 12967966 - Bioinformatics. 2003 Sep 1;19(13):1692-8 15312235 - BMC Bioinformatics. 2004 Aug 16;5:112 18981052 - Nucleic Acids Res. 2009 Jan;37(Database issue):D619-22 15888675 - Plant Physiol. 2005 May;138(1):27-37 17965431 - Nucleic Acids Res. 2008 Jan;36(Database issue):D623-31 15961492 - Bioinformatics. 2005 Jun;21 Suppl 1:i468-77 17462086 - BMC Bioinformatics. 2007;8:139 3472_CR9 S Okuda (3472_CR24) 2008; 36 3472_CR8 G Kastenmuller (3472_CR28) 2008; 24 3472_CR7 Y Yamanishi (3472_CR36) 2005; 21 L Breiman (3472_CR18) 1996; 24 S Paley (3472_CR3) 2002; 18 L Pireddu (3472_CR32) 2005 R Caspi (3472_CR4) 2008; 36 H Akaike (3472_CR16) 1974; 19 A Feist (3472_CR2) 2007; 3 P Zhang (3472_CR6) 2005; 138 A Cakmak (3472_CR35) 2007; 23 3472_CR23 A Varma (3472_CR26) 1994; 12 L Liao (3472_CR27) 2002 P Karp (3472_CR11) 2002; 18 J Sun (3472_CR30) 2004; 5 G Schwarz (3472_CR17) 1978; 6 Y Ye (3472_CR22) 2005; 21 M Green (3472_CR5) 2004; 5 G Kastenmuller (3472_CR29) 2009; 10 CJ Stone (3472_CR15) 1996 D McShan (3472_CR34) 2003; 19 W Buntine (3472_CR12) 1991 L Pireddu (3472_CR33) 2006; 34 I Keseler (3472_CR1) 2009; 37 R Overbeek (3472_CR20) 2005; 33 M DeJongh (3472_CR21) 2007; 8 L Breiman (3472_CR19) 2001; 45 M Green (3472_CR25) 2006; 34 W Buntine (3472_CR14) 1992; 2 S Seo (3472_CR10) 2009; 3 JW Pinney (3472_CR31) 2005; 33 3472_CR13
References_xml	– volume: 24 start-page: 123 issue: 2 year: 1996 ident: 3472_CR18 publication-title: Machine Learning contributor: fullname: L Breiman – volume: 18 start-page: S225 year: 2002 ident: 3472_CR11 publication-title: Bioinformatics doi: 10.1093/bioinformatics/18.suppl_1.S225 contributor: fullname: P Karp – ident: 3472_CR23 doi: 10.1093/nar/gkn863 – volume: 5 start-page: 112 year: 2004 ident: 3472_CR30 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-5-112 contributor: fullname: J Sun – ident: 3472_CR13 – volume: 45 start-page: 5 year: 2001 ident: 3472_CR19 publication-title: Machine Learning doi: 10.1023/A:1010933404324 contributor: fullname: L Breiman – volume-title: A Course in Probability and Statistics year: 1996 ident: 3472_CR15 contributor: fullname: CJ Stone – volume: 23 start-page: 2775 issue: 20 year: 2007 ident: 3472_CR35 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btm409 contributor: fullname: A Cakmak – start-page: 1 volume-title: Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB '05. Proceedings of the 2005 IEEE Symposium on year: 2005 ident: 3472_CR32 doi: 10.1109/CIBCB.2005.1594924 contributor: fullname: L Pireddu – ident: 3472_CR8 – volume: 24 start-page: i56 issue: 16 year: 2008 ident: 3472_CR28 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btn302 contributor: fullname: G Kastenmuller – volume: 138 start-page: 27 year: 2005 ident: 3472_CR6 publication-title: Plant Physiol doi: 10.1104/pp.105.060376 contributor: fullname: P Zhang – volume: 34 start-page: W714 issue: suppl 2 year: 2006 ident: 3472_CR33 publication-title: Nucleic Acids Research doi: 10.1093/nar/gkl228 contributor: fullname: L Pireddu – volume: 19 start-page: 716 issue: 6 year: 1974 ident: 3472_CR16 publication-title: IEEE Transactions on Automatic Control doi: 10.1109/TAC.1974.1100705 contributor: fullname: H Akaike – volume: 37 start-page: D464 year: 2009 ident: 3472_CR1 publication-title: Nuc Acids Res doi: 10.1093/nar/gkn751 contributor: fullname: I Keseler – volume: 18 start-page: 715 issue: 5 year: 2002 ident: 3472_CR3 publication-title: Bioinformatics doi: 10.1093/bioinformatics/18.5.715 contributor: fullname: S Paley – volume: 2 start-page: 63 year: 1992 ident: 3472_CR14 publication-title: Statistics and Computing doi: 10.1007/BF01889584 contributor: fullname: W Buntine – volume: 33 start-page: 1399 issue: 4 year: 2005 ident: 3472_CR31 publication-title: Nucleic Acids Research doi: 10.1093/nar/gki285 contributor: fullname: JW Pinney – volume: 36 start-page: W423 year: 2008 ident: 3472_CR24 publication-title: Nuc Acids Res doi: 10.1093/nar/gkn282 contributor: fullname: S Okuda – start-page: 469 volume-title: Proceedings of the 6th International Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES 02) year: 2002 ident: 3472_CR27 contributor: fullname: L Liao – volume: 21 start-page: i478 issue: Suppl 1 year: 2005 ident: 3472_CR22 publication-title: Bioinformatics doi: 10.1093/bioinformatics/bti1052 contributor: fullname: Y Ye – volume: 3 start-page: 121 year: 2007 ident: 3472_CR2 publication-title: Mol Syst Biol doi: 10.1038/msb4100155 contributor: fullname: A Feist – volume: 21 start-page: i468 issue: suppl 1 year: 2005 ident: 3472_CR36 publication-title: Bioinformatics doi: 10.1093/bioinformatics/bti1012 contributor: fullname: Y Yamanishi – volume-title: Tech. Rep. FIA-91-28, NASA Ames Research Center year: 1991 ident: 3472_CR12 contributor: fullname: W Buntine – volume: 5 start-page: 76 year: 2004 ident: 3472_CR5 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-5-76 contributor: fullname: M Green – volume: 8 start-page: 139 year: 2007 ident: 3472_CR21 publication-title: BMC Bioinformatics doi: 10.1186/1471-2105-8-139 contributor: fullname: M DeJongh – ident: 3472_CR9 – volume: 12 start-page: 994 year: 1994 ident: 3472_CR26 publication-title: Bio/Technology doi: 10.1038/nbt1094-994 contributor: fullname: A Varma – volume: 34 start-page: 3687 year: 2006 ident: 3472_CR25 publication-title: Nuc Acids Res doi: 10.1093/nar/gkl438 contributor: fullname: M Green – volume: 19 start-page: 1692 issue: 13 year: 2003 ident: 3472_CR34 publication-title: Bioinformatics doi: 10.1093/bioinformatics/btg217 contributor: fullname: D McShan – volume: 3 start-page: 33 year: 2009 ident: 3472_CR10 publication-title: BMC Syst Biol doi: 10.1186/1752-0509-3-33 contributor: fullname: S Seo – ident: 3472_CR7 – volume: 6 start-page: 461 issue: 2 year: 1978 ident: 3472_CR17 publication-title: The Annals of Statistics doi: 10.1214/aos/1176344136 contributor: fullname: G Schwarz – volume: 33 start-page: 5691 issue: 17 year: 2005 ident: 3472_CR20 publication-title: Nuc Acids Res doi: 10.1093/nar/gki866 contributor: fullname: R Overbeek – volume: 10 start-page: R28 issue: 3 year: 2009 ident: 3472_CR29 publication-title: Genome Biol doi: 10.1186/gb-2009-10-3-r28 contributor: fullname: G Kastenmuller – volume: 36 start-page: D623 year: 2008 ident: 3472_CR4 publication-title: Nuc Acids Res doi: 10.1093/nar/gkm900 contributor: fullname: R Caspi
SSID	ssj0017805
Score	2.4044273
Snippet	A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem... Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for... Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing... BACKGROUNDA key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing... BACKGROUND: A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing... Abstract Background A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for...
SourceID	doaj pubmedcentral biomedcentral proquest gale crossref pubmed
SourceType	Open Website Open Access Repository Aggregation Database Index Database
StartPage	15
SubjectTerms	Artificial Intelligence Computational Biology - methods Databases, Factual DNA sequencing Genetic algorithms Genome Machine learning Metabolic Networks and Pathways Nucleotide sequencing Software
SummonAdditionalLinks	– databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrR1dSxwxMIhQ8KVoq-36xVKE-rK4H8kmwadr8bCCPlQF30I2O1FB9sS7Q_z3zmT3jgs-9KVvu0n2YybzmUxmGDtqZW6tb0RWWGkzjhZ2ZlEGZr5C5QxNAzys6V5e1ee3_OJO3K2U-qKYsD49cI-4Ew6tUMrpXJct9xYBz0sLXrcN1aLjEKRvIRbO1LB_QJn6w7ki_Cw6NWJI6lOo-mTZRifKqBpudND9KdJPIY3_R2G9oq3iSMoV1TTeZJ8HmzId9bBssTXovrBPfZXJt69MX4aASUiHChH3aV82epqiwUrXSAdPjy6l4sSv9i19fqHNG5qwbXY7Prv5fZ4NFROyplZ8lpH540gntWUOtnKFg8Zx0VjXlhI9nwrFicWmEqy2uvI1SheuBIcCuMQHqh223k06-M5S6VUBoGvhc8XBSa153Xrr8FY6ZPKEnUZ4M899dgxD-arjHmQdQ1g3hHV0OUwhEna8wPLyweCOqPrj0F80C9H7QwNSiRmoxPyLShL2g-bQUMaLjkJq7u18OjV_rv-aEcpsrhDIMmE_h0F-gj_u7HBCARFCSbKikfvRSGRJF3WnC1Ix1EVxbB1M5lMjqwqZQFWIvm895SzhorUdXhY8YTKiqQjwuKd7fAgJwfGVdS7L3f-BqT220QdI0Mr5PlufvczhAO2uWXMYWOwdaCUqXA priority: 102 providerName: Directory of Open Access Journals – databaseName: PubMed Central dbid: RPM link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnR3bahQxNLQFwRfx7miVQQR9me5ckkmCT7VYqrAiaqFvIckk68J2dtkLpX_fczKZ0tg332Yml0ly7snJOYR86HiptTesqDTXBQUNu9DAAwvfgHB2xjga9nSnP9qzc_r9gl3sETbehQlO-9bMj_rF5VE__xt8K1eXdjL6iU1-Tk-gn7bk9WSf7POmGU30eHSAQfpjDJ9KtJMK_wx2DcMLZBUmqkEjmtbVv1fcF4lkCgH877PpO3Iq9aG8I5ROH5NHUZvMj4dRPyF7rn9KHgz5Ja-fETkNrpIuj7khZvmQMHqTg6qKz4ABi7nNMS3xlb7OV2s8tkFQPSfnp1__nJwVMVdCYVpBtwUqPhalUVeXTje2ss5Yyoy2Xc3B5mmAkWj4VDsttWx8C3yFCkZd5SiHBs0LctAve_eK5NyLyjnZMl8K6iyXkrad1xZeuQXyzsjnZN3UaoiLoTBSdVoCRKMQAAoBAMaGqlhGPo2rfNswGCKivV_1C0Ih6T98WK5nKuKDoq5jQlhZyrqjXgNhlLV2XnYGcxVSl5H3CEOFsS56dKaZ6d1mo779_qWOgVtTAZOsM_IxVvJLGLjV8W4CLAiGx0pqHiY1gRhtUpyPqKKwCD3YerfcbRQgKqCtaGD5Xg6YczuvESEzwhOcSiaelgBlhFDgkRJe_3fLN-Th4A-BG-WH5GC73rm3oGZtzbtAVjdijCeR priority: 500 providerName: National Library of Medicine – databaseName: Scholars Portal Open Access Journals dbid: M48 link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdR1da9RAcJGK4IvU-hWtJYigL9Eku8nuIiJVLFU4H9SDvi2bzexZOHLt5Q69f-_MJtd2bX1Lsh9k53uS-WDsZStza31TZYWVNhNoYWcWZWDmOSpnaBoQ4Zvu5Ft9PBVfT6qTy3ZAIwD7G1076ic1Xc7f_DnffECGfx8YXtVvC9ocXZeKcsQo4fx2Kbggcp-Iy18KVLx_rO1zwyIqCpyHrM9_U9_nkcYKhf2vi-8r-iuOrbyirI522b3RykwPB7K4z25Bt8fuDH0nNw-YnoQQSkjHnhGzdGgk3adowtI1Usb81KXUrvi33aRnS_qdQyh8yKZHn39-Os7GHgpZUyuxysggcqSl2jIHy13hoHGiaqxrS4m-EEcBY_FRCVZbzX2N8kaoSkABQuIC_ojtdIsOnrBUelUA6LryuRLgpNaibr11eCsdsn3C3kVwM2dDvQxDFazjEWQmQwgwhAB0QkxRJez1FsoXC4ODourrUz8SFqL9w4PFcmZGhjMC2kopp3NdtsJbZJi8tOB121APQwEJe0E4NFQDo6Mgm5ld97358uO7OUQpLhQeskzYq3GSX-CLOzvmLCBAqGxWNHM_molM6qLhdEsqhoYosq2Dxbo3knNkC8URfI8Hyrk415YgEyYjmooOHo90p79CiXDcss5l-fS_ez5jd4c4CPpAvs92Vss1PEfzatUcBLb5CyFnIL0 priority: 102 providerName: Scholars Portal
Title	Machine learning methods for metabolic pathway prediction
URI	https://www.ncbi.nlm.nih.gov/pubmed/20064214 https://search.proquest.com/docview/733314830 http://dx.doi.org/10.1186/1471-2105-11-15 https://pubmed.ncbi.nlm.nih.gov/PMC3146072 https://doaj.org/article/4ed588c9092d4faead02aef9db59364e
Volume	11
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3da9swEBdby2AvY-u-vHXGjMH2YmbLsiWzp2Q07QIpo10h7EXI8qkrFKc0CaP__e5kN6vavvXFSawPojud7k46_Y6xT63MjHFNmeZGmlSghZ0aXANTV6ByhqYB4fd0Z4fVwYmYzsv5f7DoWyf4uaq-5tQUHZOSboDRdfJtThjn5JiPf28ODAia318kGioPKD73dHDrZvt5oJA8bv_d1fmGegpDJ2_ooslz9mwwIpNRz_UX7BF0O-xJn1by6iWrZz5CEpIhJcRp0ueJXiZoodJ3ZPz5mU0oG_Ffc5VcXNJpDXHoFTuZ7P36fpAOKRLSplJilZK9Y0kJtTwDU9jcQmNF2RjbcomuToHrh8FXHExt6sJVuJwIVQrIQUhsULxmW92ig7cskU7lAEhZlykBVta1qFpnLP6UFqU6Yt8CuumLHg5DE0B1WIKyoonqmqiOPobOy4h9uabypqH3P1R1t-qYuBD071_grNCDPGkBbamUrbOat8IZlIeMG3B121CKQgER-0g81ARx0VEMzalZL5f6x_GRHuEiLRQOkkfs81DJLfCPWzNcSUCCECpWUHM3qIkyaIPi5HqqaCqiwLUOFuullkWBs14VSL43_czZjIs2cwTPRcRkMKeCgYcl3dkfjwCOXVaZ5O8exJL37GkfCkF75Ltsa3W5hg9oYa2amD2Wc4lPNdmP2fZoND2e4ud47_DnUex3LfA5Eyr2Mhj7LbJ_cdIpBw
link.rule.ids	108,230,315,733,786,790,870,891,2115,2236,24346,24965,27957,27958,31755,33409,33780,53827,53829,76169,76170
linkProvider	BioMedCentral
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Lb9QwELZQKwSXqrwDBSKEBJdAHk5si1OLqLbQ7QFaqeJi2Y69rLRNqn0I9d8z43hXa8qNWxI7Vjyeh8eZ-YaQty3LlXK6zgrFVEZhh50p0IGZq8A4W60t9We647NmdEG_XtaXIT0ac2H0ldHTPoCGIlDxh-009JnX3euQsUHiefOxwNHBd6kxSQwzzncZ4oyj7370c_NPAdH7fa5R6ByAfv4xwF_J77PIZnlo_9sKfMuCxdGVW-bqeJ_shX1mejhM4gG5Y7uH5O5QefLmERFjH0Rp01A1YpIOpaQXKZAEr4E3ZlOTYsHi3-omvZ7jDx1cxMfk4vjL-edRFqooZLrhdJnhlsignWrL3KrKFMZqQ2utTFsy8IYqUDEKHpVWCSUq14DGobymtrCUwQvVE7LT9Z19RlLmeGGtaGqXc2oNE4I2rVMGbpkBwU_Ip4hu8npAzJCIYR23wDpKpLpEqoMbIos6Ie_XVN686F0U3tzueoSrEI3vH_TziQzsIalta86NyEXZUqdAZPJSWSdajVUMqU3IG1xDiSgYHYbZTNRqsZAnP77LQ9DjlMMky4S8C51cDx9uVMhaAIIgcFbU8yDqCWJqouZ0zSoSmzC2rbP9aiFZVYFg8ArI93TgnM288LyHlgVNCIt4Kpp43NJNf3mQcBiyyVn5_L-W5DW5Nzofn8rTk7NvL8j9IXICj9QPyM5yvrIvYUO21K-8nP0BKKgyeQ
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpR3LbtQw0KqKQFyq8k5bIEJIcAnNw7EdcWqBVQtshQqVKi6W7djLqttktQ-h_j0ziXe1pty4JfHEkufpsedByOuap0o5XSaZ4iqhsMNOFOjAxBVgnK3WlnZnusMzdnJBP1-Wl1tkuMqF0ddGj1tfNBQLFb_bTEOfdLobHszV4bR2vcgLdpjh9OC8lJglhinnd8A7Z8jk58c_15cKWL6_SzbywL7Szz8m-Cv7fRIYra62_20NvmHCwvDKDXs12CU7fqMZH_Wc8YBs2eYhudu3nrx5RKphF0VpY982YhT3vaTnMeAEn4E5JmMTY8fi3-omns7wRgep-JhcDD79-HCS-DYKiWaCLhLcExk0VHWeWlWYzFhtaKmVqXMO7lABOkbBp9yqSlWFY6ByqCipzSzl8EPxhGw3bWOfkZg7kVlbsdKlglrDq4qy2ikDr9yA5EfkfYA3Oe1LZkgsYh2OACElYl0i1sEPkVkZkbcrLK9_7HwUwW6DHiMVgvm7D-1sJL3MSWrrUghTpVVeU6dAZtJcWVfVGtsYUhuRV0hDiWUwGoyzGanlfC5Pv5_LI1DkVMAi84i88UCuRcZTPm0BEIKVswLIgwAS5NQEw_GKVSQOYXBbY9vlXPKiAMkQBaDvac8563XhgQ_NMxoRHvBUsPBwpBn_6qqEw5Qs5fnef5HkJbn37eNAfj09-7JP7veRE3ikfkC2F7OlfQ4bsoV-0YnZH7s_Mk0
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Machine+learning+methods+for+metabolic+pathway+prediction&rft.jtitle=BMC+bioinformatics&rft.au=Dale%2C+Joseph+M&rft.au=Popescu%2C+Liviu&rft.au=Karp%2C+Peter+D&rft.date=2010-01-08&rft.eissn=1471-2105&rft.volume=11&rft.spage=15&rft_id=info:doi/10.1186%2F1471-2105-11-15&rft_id=info%3Apmid%2F20064214&rft.externalDocID=20064214
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1471-2105&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1471-2105&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1471-2105&client=summon