iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data

Abstract With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner....

Full description

Saved in:

Bibliographic Details
Published in	Briefings in bioinformatics Vol. 21; no. 3; pp. 1047 - 1057
Main Authors	Chen, Zhen, Zhao, Pei, Li, Fuyi, Marquez-Lago, Tatiana T, Leier, André, Revote, Jerico, Zhu, Yan, Powell, David R, Akutsu, Tatsuya, Webb, Geoffrey I, Chou, Kuo-Chen, Smith, A Ian, Daly, Roger J, Li, Jian, Song, Jiangning
Format	Journal Article
Language	English
Published	England Oxford University Press 21.05.2020 Oxford Publishing Limited (England)
Subjects	Algorithms Amino acid sequence Bioinformatics Clustering Computer applications Deoxyribonucleic acid DNA Engineering education Feature extraction Gene sequencing Internet Learning algorithms Machine learning Nucleotide sequence Proteins Reduction Ribonucleic acid RNA Software Toolkits automated modeling bioinformatics data clustering sequence analysis integrated platform machine learning biomedical data mining feature selection
Online Access	Get full text

Cover

Loading…

Abstract	Abstract With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
AbstractList	With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit. Abstract With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit. With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
Author	Chou, Kuo-Chen Song, Jiangning Smith, A Ian Leier, André Zhao, Pei Li, Fuyi Powell, David R Webb, Geoffrey I Chen, Zhen Akutsu, Tatsuya Daly, Roger J Revote, Jerico Marquez-Lago, Tatiana T Li, Jian Zhu, Yan
Author_xml	– sequence: 1 givenname: Zhen surname: Chen fullname: Chen, Zhen email: chenzhen-win2009@163.com organization: School of Basic Medical Science, Qingdao University, 38 Dengzhou Road, Qingdao, 266021, Shandong, China – sequence: 2 givenname: Pei surname: Zhao fullname: Zhao, Pei email: zhaopei1986@126.com organization: State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang, 455000, China – sequence: 3 givenname: Fuyi orcidid: 0000-0001-5216-3213 surname: Li fullname: Li, Fuyi email: fuyi.li1@monash.edu organization: Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia – sequence: 4 givenname: Tatiana T surname: Marquez-Lago fullname: Marquez-Lago, Tatiana T email: tmarquezlago@uabmc.edu organization: Department of Genetics, School of Medicine, University of Alabama at Birmingham, USA – sequence: 5 givenname: André surname: Leier fullname: Leier, André email: aleier@uabmc.edu organization: Department of Genetics, School of Medicine, University of Alabama at Birmingham, USA – sequence: 6 givenname: Jerico surname: Revote fullname: Revote, Jerico email: jerico.revote@monash.edu organization: Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia – sequence: 7 givenname: Yan surname: Zhu fullname: Zhu, Yan email: Yan.Zhu@monash.edu organization: Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia – sequence: 8 givenname: David R surname: Powell fullname: Powell, David R email: david.powell@monash.edu organization: Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia – sequence: 9 givenname: Tatsuya surname: Akutsu fullname: Akutsu, Tatsuya email: takutsu@kuicr.kyoto-u.ac.jp organization: Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan – sequence: 10 givenname: Geoffrey I surname: Webb fullname: Webb, Geoffrey I email: Geoff.Webb@monash.edu organization: Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia – sequence: 11 givenname: Kuo-Chen surname: Chou fullname: Chou, Kuo-Chen email: kcchou@gordonlifescience.org organization: Gordon Life Science Institute, Boston, MA 02478, USA – sequence: 12 givenname: A Ian surname: Smith fullname: Smith, A Ian email: Ian.Smith@monash.edu organization: Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia – sequence: 13 givenname: Roger J surname: Daly fullname: Daly, Roger J email: Roger.Daly@monash.edu organization: Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia – sequence: 14 givenname: Jian surname: Li fullname: Li, Jian email: Jian.Li@monash.edu organization: Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia – sequence: 15 givenname: Jiangning orcidid: 0000-0001-8031-9086 surname: Song fullname: Song, Jiangning email: Jiangning.Song@monash.edu organization: Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/31067315$$D View this record in MEDLINE/PubMed
BookMark	eNp9kctu1DAUhi1URC-w4QGQJYSEUEN9bCeesBsVepFGRUKwjhzneHCV2FPbWZT34H3xkHZTITa-_P7-3_Y5x-TAB4-EvAb2EVgrznrXn_X9LybhGTkCqVQlWS0P9utGVbVsxCE5TumWMc7UCl6QQwGsUQLqI_LbbVBH_4lqT53PuI0640B3o842xKnIA50w62rcYxhpUalFneeIFP3WecTo_PaUTtr8LLsFLEqx6vE-ubRkhAHHvRos_XyzPqXfbtZ_D3YxZHSeJryb0Rukg876JXlu9Zjw1cN8Qn5cfPl-flVtvl5en683lZEAuaqNkEpYpVnfDHyQ0LYCeV1GBVxwy1u1YmCsBdsolFY1ramlrK1CUfPGiBPyfsktryjXp9xNLhkcR-0xzKnjXMCqLdWEgr59gt6GOZYvFkrylgEA44V680DN_YRDt4tu0vG-eyx4AdgCmBhSimg747LOLvgctRs7YN2-p13pabf0tFg-PLE8pv4TfrfAYd79j_sDlZOvDQ
CitedBy_id	crossref_primary_10_1093_bioinformatics_btad108 crossref_primary_10_1109_TCBB_2024_3389094 crossref_primary_10_3390_info14120636 crossref_primary_10_1093_bfgp_elad007 crossref_primary_10_1093_bib_bbab376 crossref_primary_10_1093_bib_bbad433 crossref_primary_10_1093_bib_bbab011 crossref_primary_10_1016_j_omtn_2023_04_015 crossref_primary_10_1016_j_compbiomed_2023_107848 crossref_primary_10_1093_bfgp_elac034 crossref_primary_10_1142_S0219720021500293 crossref_primary_10_1016_j_compbiomed_2023_107724 crossref_primary_10_1016_j_neucom_2025_129637 crossref_primary_10_1093_bib_bbab008 crossref_primary_10_1093_bib_bbac218 crossref_primary_10_1016_j_biotechadv_2024_108400 crossref_primary_10_1016_j_bspc_2022_103856 crossref_primary_10_1007_s12539_021_00497_6 crossref_primary_10_3389_fgene_2022_853258 crossref_primary_10_1016_j_knosys_2023_111354 crossref_primary_10_1016_j_ymeth_2024_05_004 crossref_primary_10_1155_2022_7518779 crossref_primary_10_1016_j_csbj_2021_11_024 crossref_primary_10_1093_bioinformatics_btaa522 crossref_primary_10_1093_bioinformatics_btab611 crossref_primary_10_3390_math11030602 crossref_primary_10_3390_ijms231911026 crossref_primary_10_1093_bib_bbab245 crossref_primary_10_1021_acs_jcim_9b00629 crossref_primary_10_1093_bib_bbac573 crossref_primary_10_1093_nar_gkad929 crossref_primary_10_1016_j_compbiomed_2024_108438 crossref_primary_10_54691_bcpbm_v38i_4196 crossref_primary_10_1093_bib_bbab480 crossref_primary_10_1080_1062936X_2021_1895884 crossref_primary_10_1360_SSV_2022_0074 crossref_primary_10_1111_1753_0407_13093 crossref_primary_10_1080_07391102_2020_1821778 crossref_primary_10_1093_nargab_lqae186 crossref_primary_10_1080_23270012_2021_1961318 crossref_primary_10_1186_s12859_020_03828_4 crossref_primary_10_3389_fgene_2022_845747 crossref_primary_10_1093_bib_bbae505 crossref_primary_10_1142_S0219720020500183 crossref_primary_10_2174_0115734099258183230929173855 crossref_primary_10_1021_acsomega_3c08303 crossref_primary_10_1109_TCBB_2023_3322870 crossref_primary_10_3389_fgene_2020_539227 crossref_primary_10_1186_s12864_024_10786_1 crossref_primary_10_1155_2020_8852258 crossref_primary_10_1016_j_fbio_2024_105495 crossref_primary_10_1093_bib_bbac243 crossref_primary_10_1109_TCBB_2022_3167468 crossref_primary_10_1109_ACCESS_2020_3036090 crossref_primary_10_1042_ETLS20200257 crossref_primary_10_3389_fgene_2021_793800 crossref_primary_10_1007_s44196_024_00462_3 crossref_primary_10_1016_j_ijbiomac_2022_11_299 crossref_primary_10_3390_biology13100777 crossref_primary_10_1080_15476286_2024_2329451 crossref_primary_10_1016_j_engappai_2023_106352 crossref_primary_10_15252_msb_202110427 crossref_primary_10_1016_j_gene_2021_145643 crossref_primary_10_1109_ACCESS_2020_2991477 crossref_primary_10_3389_fcell_2020_00741 crossref_primary_10_1002_mlf2_12125 crossref_primary_10_3390_biology9100325 crossref_primary_10_1016_j_rineng_2024_102878 crossref_primary_10_1016_j_ymeth_2022_01_001 crossref_primary_10_1016_j_chemolab_2021_104245 crossref_primary_10_2174_1568026619666191018100141 crossref_primary_10_1093_bib_bbaa299 crossref_primary_10_1099_mgen_0_000483 crossref_primary_10_1093_bib_bbac352 crossref_primary_10_1016_j_ab_2020_113592 crossref_primary_10_1016_j_omtn_2020_06_004 crossref_primary_10_3389_fgene_2021_788467 crossref_primary_10_1093_bib_bbad319 crossref_primary_10_1093_bib_bbac108 crossref_primary_10_1093_gigascience_giae086 crossref_primary_10_1093_bib_bbaa049 crossref_primary_10_1142_S0219720022500056 crossref_primary_10_1109_TCBB_2023_3305992 crossref_primary_10_1371_journal_pcbi_1012607 crossref_primary_10_1093_bib_bbae528 crossref_primary_10_3389_fgene_2022_969412 crossref_primary_10_3389_fcell_2020_594587 crossref_primary_10_1016_j_chemolab_2022_104495 crossref_primary_10_1109_ACCESS_2019_2949415 crossref_primary_10_1093_bib_bbae169 crossref_primary_10_1186_s12915_023_01804_x crossref_primary_10_3389_fmed_2023_1187430 crossref_primary_10_1016_j_chemolab_2024_105103 crossref_primary_10_31083_j_fbl2709269 crossref_primary_10_1080_15476286_2021_1898160 crossref_primary_10_1080_15476286_2024_2315384 crossref_primary_10_1016_j_ijbiomac_2022_12_315 crossref_primary_10_1093_bib_bbad076 crossref_primary_10_3389_fbioe_2020_00892 crossref_primary_10_1109_TCBB_2020_3017386 crossref_primary_10_1021_acssynbio_3c00310 crossref_primary_10_1093_bib_bbaa356 crossref_primary_10_7717_peerj_11900 crossref_primary_10_3233_JCM_226872 crossref_primary_10_3389_fpls_2020_00004 crossref_primary_10_1021_acs_jcim_2c00089 crossref_primary_10_1007_s12539_022_00535_x crossref_primary_10_1093_bib_bbad063 crossref_primary_10_1093_bib_bbac094 crossref_primary_10_1186_s12859_023_05232_0 crossref_primary_10_1093_bib_bbac411 crossref_primary_10_1371_journal_pone_0309078 crossref_primary_10_1109_TCBB_2022_3224836 crossref_primary_10_1109_TCBB_2021_3136905 crossref_primary_10_1093_bioinformatics_btac074 crossref_primary_10_1093_bib_bbab434 crossref_primary_10_1016_j_asoc_2022_108676 crossref_primary_10_1016_j_neunet_2022_09_026 crossref_primary_10_1109_ACCESS_2021_3131846 crossref_primary_10_1093_nar_gkad055 crossref_primary_10_1016_j_fsigen_2024_103061 crossref_primary_10_3390_genes12050717 crossref_primary_10_1016_j_isci_2023_108197 crossref_primary_10_1016_j_ymeth_2021_09_008 crossref_primary_10_1186_s12915_024_02064_z crossref_primary_10_1016_j_compbiomed_2024_108487 crossref_primary_10_1093_bfgp_elad024 crossref_primary_10_1021_acsomega_3c05074 crossref_primary_10_1016_j_enbuild_2022_111836 crossref_primary_10_2174_1574893617666220330150259 crossref_primary_10_1093_bib_bbaa018 crossref_primary_10_1093_bib_bbaa016 crossref_primary_10_1093_bib_bbab227 crossref_primary_10_1093_bib_bbab348 crossref_primary_10_1109_TCBB_2019_2957758 crossref_primary_10_2174_0929866527666201202103411 crossref_primary_10_3390_app12073631 crossref_primary_10_1016_j_chemolab_2021_104284 crossref_primary_10_1016_j_ymeth_2024_01_005 crossref_primary_10_3389_fgene_2021_810875 crossref_primary_10_4236_ns_2020_127036 crossref_primary_10_1007_s12539_020_00362_y crossref_primary_10_2174_0929867328666211005140625 crossref_primary_10_1080_19420889_2022_2143101 crossref_primary_10_1093_bib_bbab461 crossref_primary_10_1016_j_compbiomed_2020_103899 crossref_primary_10_1186_s13071_023_05698_0 crossref_primary_10_1093_bib_bbac428 crossref_primary_10_3389_fgene_2022_935989 crossref_primary_10_1016_j_cbi_2021_109533 crossref_primary_10_3390_ijms21165847 crossref_primary_10_1109_TCBB_2020_2966450 crossref_primary_10_1093_bib_bbaa124 crossref_primary_10_1016_j_ymeth_2021_05_016 crossref_primary_10_1007_s11704_020_9504_3 crossref_primary_10_1186_s12859_023_05491_x crossref_primary_10_3390_biom12091246 crossref_primary_10_1016_j_csbj_2022_06_004 crossref_primary_10_1109_ACCESS_2020_3022629 crossref_primary_10_1016_j_csbj_2022_06_002 crossref_primary_10_1007_s10639_024_12734_8 crossref_primary_10_3389_fbioe_2020_00730 crossref_primary_10_1186_s12967_021_03084_x crossref_primary_10_1109_ACCESS_2019_2952621 crossref_primary_10_1016_j_csbj_2022_07_043 crossref_primary_10_1016_j_ygeno_2022_110454 crossref_primary_10_1109_TCBB_2022_3204365 crossref_primary_10_1016_j_omtn_2020_08_022 crossref_primary_10_1093_bib_bbaa312 crossref_primary_10_1016_j_ejmech_2023_115500 crossref_primary_10_1155_2021_9969751 crossref_primary_10_1016_j_ijbiomac_2023_124993 crossref_primary_10_1093_nar_gkad404 crossref_primary_10_2174_1389202923666220214122506 crossref_primary_10_3390_ijms26020477 crossref_primary_10_1016_j_compbiolchem_2021_107489 crossref_primary_10_1093_bib_bbab089 crossref_primary_10_2174_0113892029270191231013111911 crossref_primary_10_32604_biocell_2022_016655 crossref_primary_10_3390_genes13040677 crossref_primary_10_1016_j_compbiomed_2022_105533 crossref_primary_10_1007_s12539_021_00429_4 crossref_primary_10_3389_fdata_2021_727216 crossref_primary_10_3934_mbe_2023078 crossref_primary_10_1038_s41598_024_63461_6 crossref_primary_10_1093_bib_bbad018 crossref_primary_10_1093_bib_bbaa301 crossref_primary_10_1109_ACCESS_2020_3011508 crossref_primary_10_1109_JBHI_2024_3425716 crossref_primary_10_3390_genes12020296 crossref_primary_10_1016_j_ijbiomac_2022_12_250 crossref_primary_10_1093_bib_bbac082 crossref_primary_10_1093_bib_bbad170 crossref_primary_10_1016_j_compbiomed_2024_108166 crossref_primary_10_3390_ijms252413734 crossref_primary_10_1016_j_compbiolchem_2022_107732 crossref_primary_10_1109_ACCESS_2020_3015792 crossref_primary_10_1007_s12539_022_00520_4 crossref_primary_10_1093_bib_bbab310 crossref_primary_10_1016_j_compbiolchem_2021_107494 crossref_primary_10_1093_bioinformatics_btad524 crossref_primary_10_1186_s12915_023_01596_0 crossref_primary_10_3389_fcell_2022_845622 crossref_primary_10_1016_j_compbiomed_2023_107226 crossref_primary_10_1016_j_ymeth_2022_03_017 crossref_primary_10_1080_15476286_2021_1875180 crossref_primary_10_3934_mbe_2024013 crossref_primary_10_1021_acs_jcim_1c00251 crossref_primary_10_3389_fcell_2020_580217 crossref_primary_10_1016_j_bbrc_2024_150618 crossref_primary_10_1109_TCBB_2020_2999262 crossref_primary_10_1186_s12859_019_3178_6 crossref_primary_10_1021_acs_jproteome_0c00314 crossref_primary_10_3389_fgene_2024_1464976 crossref_primary_10_1093_bioinformatics_btae504 crossref_primary_10_1093_bioinformatics_btae625 crossref_primary_10_2174_1574893617666220318093000 crossref_primary_10_1016_j_chemolab_2023_105019 crossref_primary_10_1016_j_omtn_2021_10_012 crossref_primary_10_1590_0001_3765202420230756 crossref_primary_10_1016_j_compbiomed_2025_109845 crossref_primary_10_1016_j_chemolab_2023_104847 crossref_primary_10_1016_j_csbj_2024_06_030 crossref_primary_10_1093_bib_bbad288 crossref_primary_10_1016_j_compbiomed_2023_107915 crossref_primary_10_1016_j_compbiomed_2024_108737 crossref_primary_10_1186_s13040_023_00348_8 crossref_primary_10_1016_j_csbj_2022_07_031 crossref_primary_10_3390_genes14030605 crossref_primary_10_1186_s12864_023_09834_z crossref_primary_10_1016_j_ygeno_2021_07_004 crossref_primary_10_1007_s10489_021_03049_z crossref_primary_10_1093_bib_bbab411 crossref_primary_10_1016_j_imu_2024_101532 crossref_primary_10_1093_bib_bbz112 crossref_primary_10_3934_mbe_2022644 crossref_primary_10_1109_ACCESS_2023_3321100 crossref_primary_10_1093_bioinformatics_btad421 crossref_primary_10_1093_bib_bbac265 crossref_primary_10_3389_fgene_2022_1007618 crossref_primary_10_1093_bioinformatics_btac454 crossref_primary_10_1093_bioinformatics_btac575 crossref_primary_10_2174_0115748936285540240116065719 crossref_primary_10_3390_ijms23063044 crossref_primary_10_1038_s41598_022_14127_8 crossref_primary_10_1016_j_ymeth_2022_10_001 crossref_primary_10_3389_fgene_2023_1165765 crossref_primary_10_1016_j_patcog_2023_109626 crossref_primary_10_1093_nar_gkac351 crossref_primary_10_1038_s41598_024_76148_9 crossref_primary_10_1016_j_ygeno_2021_06_038 crossref_primary_10_1007_s11227_023_05739_6 crossref_primary_10_1038_s41598_020_77173_0 crossref_primary_10_1016_j_compbiomed_2025_109821 crossref_primary_10_1093_bib_bbad101 crossref_primary_10_1186_s12859_025_06079_3 crossref_primary_10_1093_nar_gkab829 crossref_primary_10_12677_HJCB_2022_122002 crossref_primary_10_3390_cells9081756 crossref_primary_10_1007_s11030_024_10937_2 crossref_primary_10_1093_bib_bbae309 crossref_primary_10_1111_1751_7915_70072 crossref_primary_10_1093_database_baac085 crossref_primary_10_1007_s12539_022_00503_5 crossref_primary_10_1016_j_compbiomed_2024_109297 crossref_primary_10_1007_s00438_020_01682_w crossref_primary_10_31083_j_fbl2706177 crossref_primary_10_1186_s12864_019_6357_y crossref_primary_10_1109_TCBB_2021_3107621 crossref_primary_10_1109_TCBB_2020_3013837 crossref_primary_10_1049_cje_2021_06_003 crossref_primary_10_3390_ijms252211866 crossref_primary_10_3390_genes14071441 crossref_primary_10_1109_ACCESS_2020_2999394 crossref_primary_10_1093_nar_gkab122 crossref_primary_10_1016_j_compbiomed_2023_107386 crossref_primary_10_1038_s41598_024_59777_y crossref_primary_10_1016_j_ymeth_2022_09_007 crossref_primary_10_3390_ijms23158221 crossref_primary_10_1109_ACCESS_2020_2989469 crossref_primary_10_1016_j_csbj_2022_06_032 crossref_primary_10_1016_j_omtn_2024_102425 crossref_primary_10_1016_j_csbj_2022_01_019 crossref_primary_10_1093_bib_bbac037 crossref_primary_10_1109_ACCESS_2020_2972922 crossref_primary_10_1016_j_jocs_2020_101238 crossref_primary_10_1093_bib_bbac031 crossref_primary_10_1016_j_ssaho_2025_101429 crossref_primary_10_1016_j_imu_2024_101578 crossref_primary_10_1093_femsml_uqad029 crossref_primary_10_1093_bib_bbaa401 crossref_primary_10_1186_s12859_022_04819_3 crossref_primary_10_1186_s12864_024_10077_9 crossref_primary_10_1089_cmb_2022_0241 crossref_primary_10_3389_fgene_2022_875112 crossref_primary_10_1016_j_compbiomed_2023_107030
Cites_doi	10.1371/journal.pcbi.1000636 10.1093/bioinformatics/btx302 10.1093/nar/gkv458 10.1093/bioinformatics/btu624 10.1093/bib/bbk007 10.1093/bib/bbv023 10.1093/bib/bbw108 10.1093/bioinformatics/btt072 10.1093/nar/gkx1067 10.1007/BF00994018 10.1016/j.artmed.2017.03.006 10.1007/BF02478259 10.1186/s12918-018-0570-1 10.1038/embor.2008.104 10.1093/bioinformatics/btw539 10.1109/34.400568 10.1074/mcp.M114.041947 10.1016/j.patrec.2009.09.011 10.1093/bioinformatics/btv604 10.1093/bioinformatics/bty140 10.1093/bib/bbu031 10.1093/nar/gku1019 10.1073/pnas.0607879104 10.1093/bioinformatics/btq043 10.1371/journal.pone.0022930 10.1016/j.jtbi.2010.12.024 10.1016/j.ab.2007.07.006 10.1093/nar/gky350 10.3390/ijms18091856 10.1126/science.1136800 10.1093/nar/gkv1036 10.1016/j.ab.2018.03.027 10.1093/bioinformatics/btv042 10.1093/bioinformatics/bty668 10.1038/nprot.2007.494 10.3390/ijms15033495 10.1016/j.jtbi.2018.01.023 10.1093/bioinformatics/btw564 10.1093/nar/gkx934 10.1080/00031305.1992.10475879 10.1093/nar/gkl305 10.1093/bioinformatics/btx579 10.1016/j.gpb.2018.08.004 10.1093/nar/gkw104 10.1016/j.ab.2007.10.012 10.1023/A:1010933404324 10.1016/j.molcel.2005.10.036 10.1093/bioinformatics/btu820 10.1074/jbc.M401932200 10.1371/journal.pone.0121501 10.1016/j.ab.2017.03.021 10.1145/331499.331504 10.1093/nar/gkr284 10.1198/tech.2006.s403 10.1093/nar/gks1450 10.1007/0-387-25465-X_15 10.1093/bioinformatics/btu083 10.3389/fpls.2018.00519 10.1038/nrg3920 10.1093/nar/gkp1117 10.1105/tpc.16.00751 10.1016/j.ab.2013.05.024 10.1093/bioinformatics/bty522
ContentType	Journal Article
Copyright	The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2020 The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com. The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Copyright_xml	– notice: The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2020 – notice: The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com. – notice: The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
DBID	AAYXX CITATION NPM 7QO 7SC 8FD FR3 JQ2 K9. L7M L~C L~D P64 RC3 7X8
DOI	10.1093/bib/bbz041
DatabaseName	CrossRef PubMed Biotechnology Research Abstracts Computer and Information Systems Abstracts Technology Research Database Engineering Research Database ProQuest Computer Science Collection ProQuest Health & Medical Complete (Alumni) Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Biotechnology and BioEngineering Abstracts Genetics Abstracts MEDLINE - Academic
DatabaseTitle	CrossRef PubMed Genetics Abstracts Biotechnology Research Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Health & Medical Complete (Alumni) Engineering Research Database Advanced Technologies Database with Aerospace Biotechnology and BioEngineering Abstracts Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitleList	PubMed Genetics Abstracts CrossRef MEDLINE - Academic
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Biology
EISSN	1477-4054
EndPage	1057
ExternalDocumentID	31067315 10_1093_bib_bbz041 10.1093/bib/bbz041
Genre	Journal Article
GroupedDBID	--- -E4 .2P .I3 0R~ 23N 2WC 36B 4.4 48X 53G 5GY 5VS 6J9 70D 8VB AAHBH AAIJN AAIMJ AAJKP AAMDB AAMVS AAOGV AAPQZ AAPXW AARHZ AAUQX AAVAP AAVLN ABDBF ABEJV ABEUO ABGNP ABIXL ABNKS ABPQP ABPTD ABQLI ABWST ABXVV ABXZS ABZBJ ACGFO ACGFS ACGOD ACIWK ACPRK ACUFI ACUHS ACUXJ ACYTK ADBBV ADEYI ADFTL ADGKP ADGZP ADHKW ADHZD ADOCK ADPDF ADQBN ADRDM ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEGXH AEJOX AEKKA AEKSI AELWJ AEMDU AEMOZ AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFGWE AFIYH AFOFC AFRAH AGINJ AGKEF AGQXC AGSYK AHMBA AHQJS AHXPO AIAGR AIJHB AJEEA AJEUX AKHUL AKVCP AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC ALXQX AMNDL ANAKG APIBT APWMN ARIXL AXUDD AYOIW AZVOD BAWUL BAYMD BEYMZ BHONS BQDIO BQUQU BSWAC BTQHN C45 CDBKE CS3 CZ4 DAKXR DIK DILTD DU5 D~K E3Z EAD EAP EAS EBA EBC EBD EBR EBS EBU EE~ EMB EMK EMOBN EST ESX F5P F9B FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HW0 HZ~ IOX J21 JXSIZ K1G KBUDW KOP KSI KSN M-Z M49 MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY O9- OAWHX ODMLO OJQWA OK1 OVD OVEED P2P PAFKI PEELM PQQKQ Q1. Q5Y QWB RD5 RPM RUSNO RW1 RXO SV3 TEORI TH9 TJP TLC TOX TR2 TUS W8F WOQ X7H YAYTL YKOAZ YXANX ZKX ZL0 ~91 AAYXX AHGBF CITATION NPM 7QO 7SC 8FD FR3 JQ2 K9. L7M L~C L~D P64 RC3 7X8
ID	FETCH-LOGICAL-c411t-5c3473f7a0b6d2d41993e2599371232f297801cff1f67e4f769c5445f7e3526c3
IEDL.DBID	TOX
ISSN	1467-5463 1477-4054
IngestDate	Thu Jul 10 22:31:59 EDT 2025 Mon Jun 30 08:53:05 EDT 2025 Mon Jul 21 06:04:59 EDT 2025 Thu Apr 24 22:51:18 EDT 2025 Tue Jul 01 03:39:27 EDT 2025 Wed Apr 02 07:02:00 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	3
Keywords	automated modeling bioinformatics data clustering sequence analysis integrated platform machine learning biomedical data mining feature selection
Language	English
License	This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c411t-5c3473f7a0b6d2d41993e2599371232f297801cff1f67e4f769c5445f7e3526c3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0001-8031-9086 0000-0001-5216-3213
PMID	31067315
PQID	2429011102
PQPubID	26846
PageCount	11
ParticipantIDs	proquest_miscellaneous_2231894051 proquest_journals_2429011102 pubmed_primary_31067315 crossref_citationtrail_10_1093_bib_bbz041 crossref_primary_10_1093_bib_bbz041 oup_primary_10_1093_bib_bbz041
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2020-05-21
PublicationDateYYYYMMDD	2020-05-21
PublicationDate_xml	– month: 05 year: 2020 text: 2020-05-21 day: 21
PublicationDecade	2020
PublicationPlace	England
PublicationPlace_xml	– name: England – name: Oxford
PublicationTitle	Briefings in bioinformatics
PublicationTitleAlternate	Brief Bioinform
PublicationYear	2020
Publisher	Oxford University Press Oxford Publishing Limited (England)
Publisher_xml	– name: Oxford University Press – name: Oxford Publishing Limited (England)
References	Li (2020051819282474700_ref57) 2018; 34 He (2020051819282474700_ref46) 2018; 12 Feng (2020051819282474700_ref58) 2013; 442 Song (2020051819282474700_ref66) 2018; 9 Song (2020051819282474700_ref26) 2018; 443 Chou (2020051819282474700_ref17) 2018; 46 Jain (2020051819282474700_ref48) 1999; 31 Cao (2020051819282474700_ref32) 2013; 29 Yan (2020051819282474700_ref5) 2017; 79 Guo (2020051819282474700_ref22) 2014; 30 Chen (2020051819282474700_ref12) 1834; 2013 Libbrecht (2020051819282474700_ref39) 2015; 16 Zhang (2020051819282474700_ref19) 2017; 18 Shen (2020051819282474700_ref7) 2007; 104 Chou (2020051819282474700_ref54) 2007; 370 Liu (2020051819282474700_ref25) 2015; 31 Frey (2020051819282474700_ref53) 2007; 315 Zhou (2020051819282474700_ref45) 2016; 44 Shen (2020051819282474700_ref31) 2008; 373 Rokach (2020051819282474700_ref49) 2005 Bhasin (2020051819282474700_ref9) 2004; 279 Agris (2020051819282474700_ref62) 2008; 9 Du (2020051819282474700_ref67) 2015; 14 Liu (2020051819282474700_ref15) 2015; 10 Lopez (2020051819282474700_ref55) 2017; 527 Chou (2020051819282474700_ref3) 1978; 47 Liu (2020051819282474700_ref16) 2016; 32 Chen (2020051819282474700_ref13) 2015; 16 McCulloch (2020051819282474700_ref42) 1943; 5 Liu (2020051819282474700_ref2) 2017 Sun (2020051819282474700_ref60) 2016; 44 David (2020051819282474700_ref61) 2017; 29 Du (2020051819282474700_ref33) 2014; 15 Toronen (2020051819282474700_ref1) 2018; 46 Jain (2020051819282474700_ref50) 2010; 31 Liu (2020051819282474700_ref56) 2018; 34 Chen (2020051819282474700_ref4) 2018; 19 Cao (2020051819282474700_ref6) 2015; 31 Wang (2020051819282474700_ref36) 2017; 33 Chou (2020051819282474700_ref28) 2011; 273 Cheng (2020051819282474700_ref51) 1995; 17 Freedman (2020051819282474700_ref44) 2006; 48 Song (2020051819282474700_ref27) 2018 Larranaga (2020051819282474700_ref38) 2006; 7 Xiao (2020051819282474700_ref34) 2015; 31 Chen (2020051819282474700_ref37) 2018; 34 Liu (2020051819282474700_ref24) 2015; 43 Alexandrov (2020051819282474700_ref63) 2006; 21 Rao (2020051819282474700_ref30) 2011; 39 He (2020051819282474700_ref47) 2018; 35 Li (2020051819282474700_ref29) 2006; 34 Xuan (2020051819282474700_ref59) 2018; 46 Liu (2020051819282474700_ref20) 2017; 33 Zuo (2020051819282474700_ref35) 2017; 33 Yan (2020051819282474700_ref18) 2016; 17 Chen (2020051819282474700_ref23) 2013; 41 Ester (2020051819282474700_ref52) Chen (2020051819282474700_ref14) 2011; 6 Zhang (2020051819282474700_ref65) 2018; 550 Rottig (2020051819282474700_ref10) 2010; 6 Motorin (2020051819282474700_ref64) 2010; 38 Altman (2020051819282474700_ref43) 1992; 46 Breiman (2020051819282474700_ref41) 2001; 45 Chou (2020051819282474700_ref8) 2008; 3 Cortes (2020051819282474700_ref40) 1995; 20 Song (2020051819282474700_ref11) 2010; 26 Lin (2020051819282474700_ref21) 2014; 42 Chen (2020051819282474700_ref68) 2018; 16
References_xml	– volume: 6 start-page: e1000636 year: 2010 ident: 2020051819282474700_ref10 article-title: Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families publication-title: PLoS Comput Biol doi: 10.1371/journal.pcbi.1000636 – volume: 33 start-page: 2756 year: 2017 ident: 2020051819282474700_ref36 article-title: POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles publication-title: Bioinformatics doi: 10.1093/bioinformatics/btx302 – volume: 43 start-page: W65 year: 2015 ident: 2020051819282474700_ref24 article-title: Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences publication-title: Nucleic Acids Res doi: 10.1093/nar/gkv458 – volume: 31 start-page: 279 year: 2015 ident: 2020051819282474700_ref6 article-title: Rcpi: R/bioconductor package to generate various descriptors of proteins, compounds and their interactions publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu624 – volume: 7 start-page: 86 year: 2006 ident: 2020051819282474700_ref38 article-title: Machine learning in bioinformatics publication-title: Brief Bioinform doi: 10.1093/bib/bbk007 – volume: 17 start-page: 88 year: 2016 ident: 2020051819282474700_ref18 article-title: A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues publication-title: Brief Bioinform doi: 10.1093/bib/bbv023 – volume: 19 start-page: 231 year: 2018 ident: 2020051819282474700_ref4 article-title: A comprehensive review and comparison of different computational methods for protein remote homology detection publication-title: Brief Bioinform doi: 10.1093/bib/bbw108 – volume: 29 start-page: 960 year: 2013 ident: 2020051819282474700_ref32 article-title: propy: a tool to generate various modes of Chou’s PseAAC publication-title: Bioinformatics doi: 10.1093/bioinformatics/btt072 – volume: 46 start-page: D296 year: 2018 ident: 2020051819282474700_ref17 article-title: miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions publication-title: Nucleic Acids Res doi: 10.1093/nar/gkx1067 – volume: 20 start-page: 273 year: 1995 ident: 2020051819282474700_ref40 article-title: Support-vector networks publication-title: Mach Learn doi: 10.1007/BF00994018 – volume: 79 start-page: 1 year: 2017 ident: 2020051819282474700_ref5 article-title: Protein fold recognition based on sparse representation based classification publication-title: Artif Intell Med doi: 10.1016/j.artmed.2017.03.006 – volume: 5 start-page: 115 year: 1943 ident: 2020051819282474700_ref42 article-title: A logical calculus of the ideas immanent in nervous activity publication-title: Bull Math Biophys doi: 10.1007/BF02478259 – volume: 47 start-page: 45 year: 1978 ident: 2020051819282474700_ref3 article-title: Prediction of the secondary structure of proteins from their amino acid sequence publication-title: Adv Enzymol Relat Areas Mol Biol – volume: 12 start-page: 44 year: 2018 ident: 2020051819282474700_ref46 article-title: 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features publication-title: BMC Syst Biol doi: 10.1186/s12918-018-0570-1 – volume: 9 start-page: 629 year: 2008 ident: 2020051819282474700_ref62 article-title: Bringing order to translation: the contributions of transfer RNA anticodon-domain modifications publication-title: EMBO Rep doi: 10.1038/embor.2008.104 – volume: 33 start-page: 35 year: 2017 ident: 2020051819282474700_ref20 article-title: iRSpot-EL: identify recombination spots with an ensemble learning approach publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw539 – volume: 17 start-page: 790 year: 1995 ident: 2020051819282474700_ref51 article-title: Mean shift, mode seeking, and clustering publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/34.400568 – volume: 14 start-page: 227 year: 2015 ident: 2020051819282474700_ref67 article-title: Lysine malonylation is elevated in type 2 diabetic mouse models and enriched in metabolic associated proteins publication-title: Mol Cell Proteomics doi: 10.1074/mcp.M114.041947 – year: 2017 ident: 2020051819282474700_ref2 article-title: BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches publication-title: Brief Bioinform – volume: 31 start-page: 651 year: 2010 ident: 2020051819282474700_ref50 article-title: Data clustering: 50 years beyond K-means publication-title: Pattern Recognit Lett doi: 10.1016/j.patrec.2009.09.011 – volume: 32 start-page: 362 year: 2016 ident: 2020051819282474700_ref16 article-title: iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition publication-title: Bioinformatics doi: 10.1093/bioinformatics/btv604 – volume: 34 start-page: 2499 year: 2018 ident: 2020051819282474700_ref37 article-title: iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty140 – volume: 16 start-page: 640 year: 2015 ident: 2020051819282474700_ref13 article-title: Towards more accurate prediction of ubiquitination sites: a comprehensive review of current methods, tools and features publication-title: Brief Bioinform doi: 10.1093/bib/bbu031 – volume: 42 start-page: 12961 year: 2014 ident: 2020051819282474700_ref21 article-title: iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition publication-title: Nucleic Acids Res doi: 10.1093/nar/gku1019 – volume: 104 start-page: 4337 year: 2007 ident: 2020051819282474700_ref7 article-title: Predicting protein–protein interactions based only on sequences information publication-title: Proc Natl Acad Sci U S A doi: 10.1073/pnas.0607879104 – volume: 26 start-page: 752 year: 2010 ident: 2020051819282474700_ref11 article-title: Cascleave: towards more accurate prediction of caspase substrate cleavage sites publication-title: Bioinformatics doi: 10.1093/bioinformatics/btq043 – volume: 6 start-page: e22930 year: 2011 ident: 2020051819282474700_ref14 article-title: Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs publication-title: PLoS One doi: 10.1371/journal.pone.0022930 – year: 2018 ident: 2020051819282474700_ref27 article-title: iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites publication-title: Brief Bioinform – volume: 273 start-page: 236 year: 2011 ident: 2020051819282474700_ref28 article-title: Some remarks on protein attribute prediction and pseudo amino acid composition publication-title: J Theor Biol doi: 10.1016/j.jtbi.2010.12.024 – volume: 370 start-page: 1 year: 2007 ident: 2020051819282474700_ref54 article-title: Recent progress in protein subcellular location prediction publication-title: Anal Biochem doi: 10.1016/j.ab.2007.07.006 – volume: 46 start-page: W84 year: 2018 ident: 2020051819282474700_ref1 article-title: PANNZER2: a rapid functional annotation web server publication-title: Nucleic Acids Res doi: 10.1093/nar/gky350 – volume: 18 start-page: 1856 year: 2017 ident: 2020051819282474700_ref19 article-title: PSFM-DBT: identifying DNA-binding proteins by combing position specific frequency matrix and distance-bigram transformation publication-title: Int J Mol Sci doi: 10.3390/ijms18091856 – volume: 315 start-page: 972 year: 2007 ident: 2020051819282474700_ref53 article-title: Clustering by passing messages between data points publication-title: Science doi: 10.1126/science.1136800 – volume: 44 start-page: D259 year: 2016 ident: 2020051819282474700_ref60 article-title: RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data publication-title: Nucleic Acids Res doi: 10.1093/nar/gkv1036 – volume: 550 start-page: 41 year: 2018 ident: 2020051819282474700_ref65 article-title: Accurate RNA 5-methylcytosine site prediction based on heuristic physical–chemical properties reduction and classifier ensemble publication-title: Anal Biochem doi: 10.1016/j.ab.2018.03.027 – volume: 31 start-page: 1857 year: 2015 ident: 2020051819282474700_ref34 article-title: protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences publication-title: Bioinformatics doi: 10.1093/bioinformatics/btv042 – volume: 35 start-page: 593 year: 2018 ident: 2020051819282474700_ref47 article-title: 4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty668 – volume: 3 start-page: 153 year: 2008 ident: 2020051819282474700_ref8 article-title: Cell-PLoc: a package of web servers for predicting subcellular localization of proteins in various organisms publication-title: Nat Protoc doi: 10.1038/nprot.2007.494 – volume: 15 start-page: 3495 year: 2014 ident: 2020051819282474700_ref33 article-title: PseAAC-general: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets publication-title: Int J Mol Sci doi: 10.3390/ijms15033495 – volume: 443 start-page: 125 year: 2018 ident: 2020051819282474700_ref26 article-title: PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework publication-title: J Theor Biol doi: 10.1016/j.jtbi.2018.01.023 – volume: 33 start-page: 122 year: 2017 ident: 2020051819282474700_ref35 article-title: PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition publication-title: Bioinformatics doi: 10.1093/bioinformatics/btw564 – volume: 46 start-page: D327 year: 2018 ident: 2020051819282474700_ref59 article-title: RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data publication-title: Nucleic Acids Res doi: 10.1093/nar/gkx934 – volume: 46 start-page: 175 year: 1992 ident: 2020051819282474700_ref43 article-title: An Introduction to kernel and nearest-neighbor nonparametric regression publication-title: Am Stat doi: 10.1080/00031305.1992.10475879 – volume: 34 start-page: W32 year: 2006 ident: 2020051819282474700_ref29 article-title: PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence publication-title: Nucleic Acids Res doi: 10.1093/nar/gkl305 – volume: 34 start-page: 33 year: 2018 ident: 2020051819282474700_ref56 article-title: iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC publication-title: Bioinformatics doi: 10.1093/bioinformatics/btx579 – volume: 16 start-page: 451 year: 2018 ident: 2020051819282474700_ref68 article-title: Integration of a deep learning classifier with a random forest approach for predicting malonylation sites publication-title: Genomics Proteomics Bioinformatics doi: 10.1016/j.gpb.2018.08.004 – volume: 44 start-page: e91 year: 2016 ident: 2020051819282474700_ref45 article-title: SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features publication-title: Nucleic Acids Res doi: 10.1093/nar/gkw104 – volume: 373 start-page: 386 year: 2008 ident: 2020051819282474700_ref31 article-title: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition publication-title: Anal Biochem doi: 10.1016/j.ab.2007.10.012 – volume: 45 start-page: 5 year: 2001 ident: 2020051819282474700_ref41 article-title: Random forests publication-title: Mach Learn doi: 10.1023/A:1010933404324 – volume: 21 start-page: 87 year: 2006 ident: 2020051819282474700_ref63 article-title: Rapid tRNA decay can result from lack of nonessential modifications publication-title: Mol Cell doi: 10.1016/j.molcel.2005.10.036 – volume: 31 start-page: 1307 year: 2015 ident: 2020051819282474700_ref25 article-title: repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu820 – volume: 279 start-page: 23262 year: 2004 ident: 2020051819282474700_ref9 article-title: Classification of nuclear receptors based on amino acid composition and dipeptide composition publication-title: J Biol Chem doi: 10.1074/jbc.M401932200 – volume: 10 start-page: e0121501 year: 2015 ident: 2020051819282474700_ref15 article-title: Identification of real microRNA precursors with a pseudo structure status composition approach publication-title: PLoS One doi: 10.1371/journal.pone.0121501 – volume: 527 start-page: 24 year: 2017 ident: 2020051819282474700_ref55 article-title: SucStruct: prediction of succinylated lysine residues by using structural properties of amino acids publication-title: Anal Biochem doi: 10.1016/j.ab.2017.03.021 – volume: 31 start-page: 264 year: 1999 ident: 2020051819282474700_ref48 article-title: Data clustering: a review publication-title: ACM Comput Surv doi: 10.1145/331499.331504 – volume: 2013 start-page: 1461 year: 1834 ident: 2020051819282474700_ref12 article-title: hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties publication-title: Biochim Biophys Acta – volume: 39 start-page: W385 year: 2011 ident: 2020051819282474700_ref30 article-title: Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence publication-title: Nucleic Acids Res doi: 10.1093/nar/gkr284 – volume: 48 start-page: 315 year: 2006 ident: 2020051819282474700_ref44 article-title: Statistical models: theory and practice publication-title: Technometrics doi: 10.1198/tech.2006.s403 – volume: 41 start-page: e68 year: 2013 ident: 2020051819282474700_ref23 article-title: iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition publication-title: Nucleic Acids Res doi: 10.1093/nar/gks1450 – start-page: 226 volume-title: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining ident: 2020051819282474700_ref52 – start-page: 321 volume-title: Data Mining and Knowledge Discovery Handbook year: 2005 ident: 2020051819282474700_ref49 doi: 10.1007/0-387-25465-X_15 – volume: 30 start-page: 1522 year: 2014 ident: 2020051819282474700_ref22 article-title: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition publication-title: Bioinformatics doi: 10.1093/bioinformatics/btu083 – volume: 9 start-page: 519 year: 2018 ident: 2020051819282474700_ref66 article-title: Transcriptome-wide annotation of m(5)C RNA modifications using machine learning publication-title: Front Plant Sci doi: 10.3389/fpls.2018.00519 – volume: 16 start-page: 321 year: 2015 ident: 2020051819282474700_ref39 article-title: Machine learning applications in genetics and genomics publication-title: Nat Rev Genet doi: 10.1038/nrg3920 – volume: 38 start-page: 1415 year: 2010 ident: 2020051819282474700_ref64 article-title: 5-methylcytosine in RNA: detection, enzymatic formation and biological functions publication-title: Nucleic Acids Res doi: 10.1093/nar/gkp1117 – volume: 29 start-page: 445 year: 2017 ident: 2020051819282474700_ref61 article-title: Transcriptome-wide mapping of RNA 5-Methylcytosine in Arabidopsis mRNAs and noncoding RNAs publication-title: Plant Cell doi: 10.1105/tpc.16.00751 – volume: 442 start-page: 118 year: 2013 ident: 2020051819282474700_ref58 article-title: iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition publication-title: Anal Biochem doi: 10.1016/j.ab.2013.05.024 – volume: 34 start-page: 4223 year: 2018 ident: 2020051819282474700_ref57 article-title: Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome publication-title: Bioinformatics doi: 10.1093/bioinformatics/bty522
SSID	ssj0020781
Score	2.65866
Snippet	Abstract With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and... With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational...
SourceID	proquest pubmed crossref oup
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	1047
SubjectTerms	Algorithms Amino acid sequence Bioinformatics Clustering Computer applications Deoxyribonucleic acid DNA Engineering education Feature extraction Gene sequencing Internet Learning algorithms Machine learning Nucleotide sequence Proteins Reduction Ribonucleic acid RNA Software Toolkits
Title	iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
URI	https://www.ncbi.nlm.nih.gov/pubmed/31067315 https://www.proquest.com/docview/2429011102 https://www.proquest.com/docview/2231894051
Volume	21
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhZ1LSwMxEMeDFAQv4ttqLRG9CA3dd3a9FbUUwQrSQm_LbjaRQrst7fag38Pv60yyXSkWvea1sJMw_5CZ3xByawseWGEWsjSQAQMXIBl4QZsp7nNwF1aidG3Al37QG3rPI39UBtEstzzhR247HaftNP20dHo6eF8k5A9eR9W1CnE1JoeIM4S7ryGkG1M33M5GKtsvRak9S_eA7JeSkHaMDQ_JjsyPyK4pEvlxTL7GmoF6T5OcVnCHjM4nSYGCE5ozOpVFwnQBCLmg0EqV1MBOKn9wgy061YGTkpWVIt5hqiGSmDWwJg62zhR97Hda9K3f0R0a5TDO6TrqmmJU6QkZdp8GDz1WFlNgwrPtgvnC9bireGKlQeZkHgbuSbj7IA8PVJVyEEVkC6VsFXDpKR5EAkE9iktE6Av3lNTyWS7PCXWVk0WOULBu5vEI7lzKF6EXSZmoMA2jOrlb_-tYlKRxLHgxic2LtxuDXWJjlzq5qcbODV9j66gmmOzPAY21NePyEC5jUB-YWAsSqk6uq244PvgmkuRytoIxoG_DCFQrLHFmdkH1GRfxeq7tX_z39Uuy5-A13PKZYzdIrVis5BVolSJt6q36DfCD5cc
linkProvider	Oxford University Press
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=iLearn%3A+an+integrated+platform+and+meta-learner+for+feature+engineering%2C+machine-learning+analysis+and+modeling+of+DNA%2C+RNA+and+protein+sequence+data&rft.jtitle=Briefings+in+bioinformatics&rft.au=Chen%2C+Zhen&rft.au=Zhao%2C+Pei&rft.au=Li%2C+Fuyi&rft.au=Marquez-Lago%2C+Tatiana+T&rft.date=2020-05-21&rft.issn=1467-5463&rft.eissn=1477-4054&rft.volume=21&rft.issue=3&rft.spage=1047&rft.epage=1057&rft_id=info:doi/10.1093%2Fbib%2Fbbz041&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bib_bbz041
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1467-5463&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1467-5463&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1467-5463&client=summon