iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data

Abstract With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner....

Full description

Saved in:
Bibliographic Details
Published inBriefings in bioinformatics Vol. 21; no. 3; pp. 1047 - 1057
Main Authors Chen, Zhen, Zhao, Pei, Li, Fuyi, Marquez-Lago, Tatiana T, Leier, André, Revote, Jerico, Zhu, Yan, Powell, David R, Akutsu, Tatsuya, Webb, Geoffrey I, Chou, Kuo-Chen, Smith, A Ian, Daly, Roger J, Li, Jian, Song, Jiangning
Format Journal Article
LanguageEnglish
Published England Oxford University Press 21.05.2020
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Abstract With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
AbstractList With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
Abstract With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
Author Chou, Kuo-Chen
Song, Jiangning
Smith, A Ian
Leier, André
Zhao, Pei
Li, Fuyi
Powell, David R
Webb, Geoffrey I
Chen, Zhen
Akutsu, Tatsuya
Daly, Roger J
Revote, Jerico
Marquez-Lago, Tatiana T
Li, Jian
Zhu, Yan
Author_xml – sequence: 1
  givenname: Zhen
  surname: Chen
  fullname: Chen, Zhen
  email: chenzhen-win2009@163.com
  organization: School of Basic Medical Science, Qingdao University, 38 Dengzhou Road, Qingdao, 266021, Shandong, China
– sequence: 2
  givenname: Pei
  surname: Zhao
  fullname: Zhao, Pei
  email: zhaopei1986@126.com
  organization: State Key Laboratory of Cotton Biology, Institute of Cotton Research of Chinese Academy of Agricultural Sciences (CAAS), Anyang, 455000, China
– sequence: 3
  givenname: Fuyi
  orcidid: 0000-0001-5216-3213
  surname: Li
  fullname: Li, Fuyi
  email: fuyi.li1@monash.edu
  organization: Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
– sequence: 4
  givenname: Tatiana T
  surname: Marquez-Lago
  fullname: Marquez-Lago, Tatiana T
  email: tmarquezlago@uabmc.edu
  organization: Department of Genetics, School of Medicine, University of Alabama at Birmingham, USA
– sequence: 5
  givenname: André
  surname: Leier
  fullname: Leier, André
  email: aleier@uabmc.edu
  organization: Department of Genetics, School of Medicine, University of Alabama at Birmingham, USA
– sequence: 6
  givenname: Jerico
  surname: Revote
  fullname: Revote, Jerico
  email: jerico.revote@monash.edu
  organization: Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
– sequence: 7
  givenname: Yan
  surname: Zhu
  fullname: Zhu, Yan
  email: Yan.Zhu@monash.edu
  organization: Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
– sequence: 8
  givenname: David R
  surname: Powell
  fullname: Powell, David R
  email: david.powell@monash.edu
  organization: Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
– sequence: 9
  givenname: Tatsuya
  surname: Akutsu
  fullname: Akutsu, Tatsuya
  email: takutsu@kuicr.kyoto-u.ac.jp
  organization: Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto 611-0011, Japan
– sequence: 10
  givenname: Geoffrey I
  surname: Webb
  fullname: Webb, Geoffrey I
  email: Geoff.Webb@monash.edu
  organization: Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
– sequence: 11
  givenname: Kuo-Chen
  surname: Chou
  fullname: Chou, Kuo-Chen
  email: kcchou@gordonlifescience.org
  organization: Gordon Life Science Institute, Boston, MA 02478, USA
– sequence: 12
  givenname: A Ian
  surname: Smith
  fullname: Smith, A Ian
  email: Ian.Smith@monash.edu
  organization: Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
– sequence: 13
  givenname: Roger J
  surname: Daly
  fullname: Daly, Roger J
  email: Roger.Daly@monash.edu
  organization: Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
– sequence: 14
  givenname: Jian
  surname: Li
  fullname: Li, Jian
  email: Jian.Li@monash.edu
  organization: Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
– sequence: 15
  givenname: Jiangning
  orcidid: 0000-0001-8031-9086
  surname: Song
  fullname: Song, Jiangning
  email: Jiangning.Song@monash.edu
  organization: Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
BackLink https://www.ncbi.nlm.nih.gov/pubmed/31067315$$D View this record in MEDLINE/PubMed
BookMark eNp9kctu1DAUhi1URC-w4QGQJYSEUEN9bCeesBsVepFGRUKwjhzneHCV2FPbWZT34H3xkHZTITa-_P7-3_Y5x-TAB4-EvAb2EVgrznrXn_X9LybhGTkCqVQlWS0P9utGVbVsxCE5TumWMc7UCl6QQwGsUQLqI_LbbVBH_4lqT53PuI0640B3o842xKnIA50w62rcYxhpUalFneeIFP3WecTo_PaUTtr8LLsFLEqx6vE-ubRkhAHHvRos_XyzPqXfbtZ_D3YxZHSeJryb0Rukg876JXlu9Zjw1cN8Qn5cfPl-flVtvl5en683lZEAuaqNkEpYpVnfDHyQ0LYCeV1GBVxwy1u1YmCsBdsolFY1ramlrK1CUfPGiBPyfsktryjXp9xNLhkcR-0xzKnjXMCqLdWEgr59gt6GOZYvFkrylgEA44V680DN_YRDt4tu0vG-eyx4AdgCmBhSimg747LOLvgctRs7YN2-p13pabf0tFg-PLE8pv4TfrfAYd79j_sDlZOvDQ
CitedBy_id crossref_primary_10_1093_bioinformatics_btad108
crossref_primary_10_1109_TCBB_2024_3389094
crossref_primary_10_3390_info14120636
crossref_primary_10_1093_bfgp_elad007
crossref_primary_10_1093_bib_bbab376
crossref_primary_10_1093_bib_bbad433
crossref_primary_10_1093_bib_bbab011
crossref_primary_10_1016_j_omtn_2023_04_015
crossref_primary_10_1016_j_compbiomed_2023_107848
crossref_primary_10_1093_bfgp_elac034
crossref_primary_10_1142_S0219720021500293
crossref_primary_10_1016_j_compbiomed_2023_107724
crossref_primary_10_1016_j_neucom_2025_129637
crossref_primary_10_1093_bib_bbab008
crossref_primary_10_1093_bib_bbac218
crossref_primary_10_1016_j_biotechadv_2024_108400
crossref_primary_10_1016_j_bspc_2022_103856
crossref_primary_10_1007_s12539_021_00497_6
crossref_primary_10_3389_fgene_2022_853258
crossref_primary_10_1016_j_knosys_2023_111354
crossref_primary_10_1016_j_ymeth_2024_05_004
crossref_primary_10_1155_2022_7518779
crossref_primary_10_1016_j_csbj_2021_11_024
crossref_primary_10_1093_bioinformatics_btaa522
crossref_primary_10_1093_bioinformatics_btab611
crossref_primary_10_3390_math11030602
crossref_primary_10_3390_ijms231911026
crossref_primary_10_1093_bib_bbab245
crossref_primary_10_1021_acs_jcim_9b00629
crossref_primary_10_1093_bib_bbac573
crossref_primary_10_1093_nar_gkad929
crossref_primary_10_1016_j_compbiomed_2024_108438
crossref_primary_10_54691_bcpbm_v38i_4196
crossref_primary_10_1093_bib_bbab480
crossref_primary_10_1080_1062936X_2021_1895884
crossref_primary_10_1360_SSV_2022_0074
crossref_primary_10_1111_1753_0407_13093
crossref_primary_10_1080_07391102_2020_1821778
crossref_primary_10_1093_nargab_lqae186
crossref_primary_10_1080_23270012_2021_1961318
crossref_primary_10_1186_s12859_020_03828_4
crossref_primary_10_3389_fgene_2022_845747
crossref_primary_10_1093_bib_bbae505
crossref_primary_10_1142_S0219720020500183
crossref_primary_10_2174_0115734099258183230929173855
crossref_primary_10_1021_acsomega_3c08303
crossref_primary_10_1109_TCBB_2023_3322870
crossref_primary_10_3389_fgene_2020_539227
crossref_primary_10_1186_s12864_024_10786_1
crossref_primary_10_1155_2020_8852258
crossref_primary_10_1016_j_fbio_2024_105495
crossref_primary_10_1093_bib_bbac243
crossref_primary_10_1109_TCBB_2022_3167468
crossref_primary_10_1109_ACCESS_2020_3036090
crossref_primary_10_1042_ETLS20200257
crossref_primary_10_3389_fgene_2021_793800
crossref_primary_10_1007_s44196_024_00462_3
crossref_primary_10_1016_j_ijbiomac_2022_11_299
crossref_primary_10_3390_biology13100777
crossref_primary_10_1080_15476286_2024_2329451
crossref_primary_10_1016_j_engappai_2023_106352
crossref_primary_10_15252_msb_202110427
crossref_primary_10_1016_j_gene_2021_145643
crossref_primary_10_1109_ACCESS_2020_2991477
crossref_primary_10_3389_fcell_2020_00741
crossref_primary_10_1002_mlf2_12125
crossref_primary_10_3390_biology9100325
crossref_primary_10_1016_j_rineng_2024_102878
crossref_primary_10_1016_j_ymeth_2022_01_001
crossref_primary_10_1016_j_chemolab_2021_104245
crossref_primary_10_2174_1568026619666191018100141
crossref_primary_10_1093_bib_bbaa299
crossref_primary_10_1099_mgen_0_000483
crossref_primary_10_1093_bib_bbac352
crossref_primary_10_1016_j_ab_2020_113592
crossref_primary_10_1016_j_omtn_2020_06_004
crossref_primary_10_3389_fgene_2021_788467
crossref_primary_10_1093_bib_bbad319
crossref_primary_10_1093_bib_bbac108
crossref_primary_10_1093_gigascience_giae086
crossref_primary_10_1093_bib_bbaa049
crossref_primary_10_1142_S0219720022500056
crossref_primary_10_1109_TCBB_2023_3305992
crossref_primary_10_1371_journal_pcbi_1012607
crossref_primary_10_1093_bib_bbae528
crossref_primary_10_3389_fgene_2022_969412
crossref_primary_10_3389_fcell_2020_594587
crossref_primary_10_1016_j_chemolab_2022_104495
crossref_primary_10_1109_ACCESS_2019_2949415
crossref_primary_10_1093_bib_bbae169
crossref_primary_10_1186_s12915_023_01804_x
crossref_primary_10_3389_fmed_2023_1187430
crossref_primary_10_1016_j_chemolab_2024_105103
crossref_primary_10_31083_j_fbl2709269
crossref_primary_10_1080_15476286_2021_1898160
crossref_primary_10_1080_15476286_2024_2315384
crossref_primary_10_1016_j_ijbiomac_2022_12_315
crossref_primary_10_1093_bib_bbad076
crossref_primary_10_3389_fbioe_2020_00892
crossref_primary_10_1109_TCBB_2020_3017386
crossref_primary_10_1021_acssynbio_3c00310
crossref_primary_10_1093_bib_bbaa356
crossref_primary_10_7717_peerj_11900
crossref_primary_10_3233_JCM_226872
crossref_primary_10_3389_fpls_2020_00004
crossref_primary_10_1021_acs_jcim_2c00089
crossref_primary_10_1007_s12539_022_00535_x
crossref_primary_10_1093_bib_bbad063
crossref_primary_10_1093_bib_bbac094
crossref_primary_10_1186_s12859_023_05232_0
crossref_primary_10_1093_bib_bbac411
crossref_primary_10_1371_journal_pone_0309078
crossref_primary_10_1109_TCBB_2022_3224836
crossref_primary_10_1109_TCBB_2021_3136905
crossref_primary_10_1093_bioinformatics_btac074
crossref_primary_10_1093_bib_bbab434
crossref_primary_10_1016_j_asoc_2022_108676
crossref_primary_10_1016_j_neunet_2022_09_026
crossref_primary_10_1109_ACCESS_2021_3131846
crossref_primary_10_1093_nar_gkad055
crossref_primary_10_1016_j_fsigen_2024_103061
crossref_primary_10_3390_genes12050717
crossref_primary_10_1016_j_isci_2023_108197
crossref_primary_10_1016_j_ymeth_2021_09_008
crossref_primary_10_1186_s12915_024_02064_z
crossref_primary_10_1016_j_compbiomed_2024_108487
crossref_primary_10_1093_bfgp_elad024
crossref_primary_10_1021_acsomega_3c05074
crossref_primary_10_1016_j_enbuild_2022_111836
crossref_primary_10_2174_1574893617666220330150259
crossref_primary_10_1093_bib_bbaa018
crossref_primary_10_1093_bib_bbaa016
crossref_primary_10_1093_bib_bbab227
crossref_primary_10_1093_bib_bbab348
crossref_primary_10_1109_TCBB_2019_2957758
crossref_primary_10_2174_0929866527666201202103411
crossref_primary_10_3390_app12073631
crossref_primary_10_1016_j_chemolab_2021_104284
crossref_primary_10_1016_j_ymeth_2024_01_005
crossref_primary_10_3389_fgene_2021_810875
crossref_primary_10_4236_ns_2020_127036
crossref_primary_10_1007_s12539_020_00362_y
crossref_primary_10_2174_0929867328666211005140625
crossref_primary_10_1080_19420889_2022_2143101
crossref_primary_10_1093_bib_bbab461
crossref_primary_10_1016_j_compbiomed_2020_103899
crossref_primary_10_1186_s13071_023_05698_0
crossref_primary_10_1093_bib_bbac428
crossref_primary_10_3389_fgene_2022_935989
crossref_primary_10_1016_j_cbi_2021_109533
crossref_primary_10_3390_ijms21165847
crossref_primary_10_1109_TCBB_2020_2966450
crossref_primary_10_1093_bib_bbaa124
crossref_primary_10_1016_j_ymeth_2021_05_016
crossref_primary_10_1007_s11704_020_9504_3
crossref_primary_10_1186_s12859_023_05491_x
crossref_primary_10_3390_biom12091246
crossref_primary_10_1016_j_csbj_2022_06_004
crossref_primary_10_1109_ACCESS_2020_3022629
crossref_primary_10_1016_j_csbj_2022_06_002
crossref_primary_10_1007_s10639_024_12734_8
crossref_primary_10_3389_fbioe_2020_00730
crossref_primary_10_1186_s12967_021_03084_x
crossref_primary_10_1109_ACCESS_2019_2952621
crossref_primary_10_1016_j_csbj_2022_07_043
crossref_primary_10_1016_j_ygeno_2022_110454
crossref_primary_10_1109_TCBB_2022_3204365
crossref_primary_10_1016_j_omtn_2020_08_022
crossref_primary_10_1093_bib_bbaa312
crossref_primary_10_1016_j_ejmech_2023_115500
crossref_primary_10_1155_2021_9969751
crossref_primary_10_1016_j_ijbiomac_2023_124993
crossref_primary_10_1093_nar_gkad404
crossref_primary_10_2174_1389202923666220214122506
crossref_primary_10_3390_ijms26020477
crossref_primary_10_1016_j_compbiolchem_2021_107489
crossref_primary_10_1093_bib_bbab089
crossref_primary_10_2174_0113892029270191231013111911
crossref_primary_10_32604_biocell_2022_016655
crossref_primary_10_3390_genes13040677
crossref_primary_10_1016_j_compbiomed_2022_105533
crossref_primary_10_1007_s12539_021_00429_4
crossref_primary_10_3389_fdata_2021_727216
crossref_primary_10_3934_mbe_2023078
crossref_primary_10_1038_s41598_024_63461_6
crossref_primary_10_1093_bib_bbad018
crossref_primary_10_1093_bib_bbaa301
crossref_primary_10_1109_ACCESS_2020_3011508
crossref_primary_10_1109_JBHI_2024_3425716
crossref_primary_10_3390_genes12020296
crossref_primary_10_1016_j_ijbiomac_2022_12_250
crossref_primary_10_1093_bib_bbac082
crossref_primary_10_1093_bib_bbad170
crossref_primary_10_1016_j_compbiomed_2024_108166
crossref_primary_10_3390_ijms252413734
crossref_primary_10_1016_j_compbiolchem_2022_107732
crossref_primary_10_1109_ACCESS_2020_3015792
crossref_primary_10_1007_s12539_022_00520_4
crossref_primary_10_1093_bib_bbab310
crossref_primary_10_1016_j_compbiolchem_2021_107494
crossref_primary_10_1093_bioinformatics_btad524
crossref_primary_10_1186_s12915_023_01596_0
crossref_primary_10_3389_fcell_2022_845622
crossref_primary_10_1016_j_compbiomed_2023_107226
crossref_primary_10_1016_j_ymeth_2022_03_017
crossref_primary_10_1080_15476286_2021_1875180
crossref_primary_10_3934_mbe_2024013
crossref_primary_10_1021_acs_jcim_1c00251
crossref_primary_10_3389_fcell_2020_580217
crossref_primary_10_1016_j_bbrc_2024_150618
crossref_primary_10_1109_TCBB_2020_2999262
crossref_primary_10_1186_s12859_019_3178_6
crossref_primary_10_1021_acs_jproteome_0c00314
crossref_primary_10_3389_fgene_2024_1464976
crossref_primary_10_1093_bioinformatics_btae504
crossref_primary_10_1093_bioinformatics_btae625
crossref_primary_10_2174_1574893617666220318093000
crossref_primary_10_1016_j_chemolab_2023_105019
crossref_primary_10_1016_j_omtn_2021_10_012
crossref_primary_10_1590_0001_3765202420230756
crossref_primary_10_1016_j_compbiomed_2025_109845
crossref_primary_10_1016_j_chemolab_2023_104847
crossref_primary_10_1016_j_csbj_2024_06_030
crossref_primary_10_1093_bib_bbad288
crossref_primary_10_1016_j_compbiomed_2023_107915
crossref_primary_10_1016_j_compbiomed_2024_108737
crossref_primary_10_1186_s13040_023_00348_8
crossref_primary_10_1016_j_csbj_2022_07_031
crossref_primary_10_3390_genes14030605
crossref_primary_10_1186_s12864_023_09834_z
crossref_primary_10_1016_j_ygeno_2021_07_004
crossref_primary_10_1007_s10489_021_03049_z
crossref_primary_10_1093_bib_bbab411
crossref_primary_10_1016_j_imu_2024_101532
crossref_primary_10_1093_bib_bbz112
crossref_primary_10_3934_mbe_2022644
crossref_primary_10_1109_ACCESS_2023_3321100
crossref_primary_10_1093_bioinformatics_btad421
crossref_primary_10_1093_bib_bbac265
crossref_primary_10_3389_fgene_2022_1007618
crossref_primary_10_1093_bioinformatics_btac454
crossref_primary_10_1093_bioinformatics_btac575
crossref_primary_10_2174_0115748936285540240116065719
crossref_primary_10_3390_ijms23063044
crossref_primary_10_1038_s41598_022_14127_8
crossref_primary_10_1016_j_ymeth_2022_10_001
crossref_primary_10_3389_fgene_2023_1165765
crossref_primary_10_1016_j_patcog_2023_109626
crossref_primary_10_1093_nar_gkac351
crossref_primary_10_1038_s41598_024_76148_9
crossref_primary_10_1016_j_ygeno_2021_06_038
crossref_primary_10_1007_s11227_023_05739_6
crossref_primary_10_1038_s41598_020_77173_0
crossref_primary_10_1016_j_compbiomed_2025_109821
crossref_primary_10_1093_bib_bbad101
crossref_primary_10_1186_s12859_025_06079_3
crossref_primary_10_1093_nar_gkab829
crossref_primary_10_12677_HJCB_2022_122002
crossref_primary_10_3390_cells9081756
crossref_primary_10_1007_s11030_024_10937_2
crossref_primary_10_1093_bib_bbae309
crossref_primary_10_1111_1751_7915_70072
crossref_primary_10_1093_database_baac085
crossref_primary_10_1007_s12539_022_00503_5
crossref_primary_10_1016_j_compbiomed_2024_109297
crossref_primary_10_1007_s00438_020_01682_w
crossref_primary_10_31083_j_fbl2706177
crossref_primary_10_1186_s12864_019_6357_y
crossref_primary_10_1109_TCBB_2021_3107621
crossref_primary_10_1109_TCBB_2020_3013837
crossref_primary_10_1049_cje_2021_06_003
crossref_primary_10_3390_ijms252211866
crossref_primary_10_3390_genes14071441
crossref_primary_10_1109_ACCESS_2020_2999394
crossref_primary_10_1093_nar_gkab122
crossref_primary_10_1016_j_compbiomed_2023_107386
crossref_primary_10_1038_s41598_024_59777_y
crossref_primary_10_1016_j_ymeth_2022_09_007
crossref_primary_10_3390_ijms23158221
crossref_primary_10_1109_ACCESS_2020_2989469
crossref_primary_10_1016_j_csbj_2022_06_032
crossref_primary_10_1016_j_omtn_2024_102425
crossref_primary_10_1016_j_csbj_2022_01_019
crossref_primary_10_1093_bib_bbac037
crossref_primary_10_1109_ACCESS_2020_2972922
crossref_primary_10_1016_j_jocs_2020_101238
crossref_primary_10_1093_bib_bbac031
crossref_primary_10_1016_j_ssaho_2025_101429
crossref_primary_10_1016_j_imu_2024_101578
crossref_primary_10_1093_femsml_uqad029
crossref_primary_10_1093_bib_bbaa401
crossref_primary_10_1186_s12859_022_04819_3
crossref_primary_10_1186_s12864_024_10077_9
crossref_primary_10_1089_cmb_2022_0241
crossref_primary_10_3389_fgene_2022_875112
crossref_primary_10_1016_j_compbiomed_2023_107030
Cites_doi 10.1371/journal.pcbi.1000636
10.1093/bioinformatics/btx302
10.1093/nar/gkv458
10.1093/bioinformatics/btu624
10.1093/bib/bbk007
10.1093/bib/bbv023
10.1093/bib/bbw108
10.1093/bioinformatics/btt072
10.1093/nar/gkx1067
10.1007/BF00994018
10.1016/j.artmed.2017.03.006
10.1007/BF02478259
10.1186/s12918-018-0570-1
10.1038/embor.2008.104
10.1093/bioinformatics/btw539
10.1109/34.400568
10.1074/mcp.M114.041947
10.1016/j.patrec.2009.09.011
10.1093/bioinformatics/btv604
10.1093/bioinformatics/bty140
10.1093/bib/bbu031
10.1093/nar/gku1019
10.1073/pnas.0607879104
10.1093/bioinformatics/btq043
10.1371/journal.pone.0022930
10.1016/j.jtbi.2010.12.024
10.1016/j.ab.2007.07.006
10.1093/nar/gky350
10.3390/ijms18091856
10.1126/science.1136800
10.1093/nar/gkv1036
10.1016/j.ab.2018.03.027
10.1093/bioinformatics/btv042
10.1093/bioinformatics/bty668
10.1038/nprot.2007.494
10.3390/ijms15033495
10.1016/j.jtbi.2018.01.023
10.1093/bioinformatics/btw564
10.1093/nar/gkx934
10.1080/00031305.1992.10475879
10.1093/nar/gkl305
10.1093/bioinformatics/btx579
10.1016/j.gpb.2018.08.004
10.1093/nar/gkw104
10.1016/j.ab.2007.10.012
10.1023/A:1010933404324
10.1016/j.molcel.2005.10.036
10.1093/bioinformatics/btu820
10.1074/jbc.M401932200
10.1371/journal.pone.0121501
10.1016/j.ab.2017.03.021
10.1145/331499.331504
10.1093/nar/gkr284
10.1198/tech.2006.s403
10.1093/nar/gks1450
10.1007/0-387-25465-X_15
10.1093/bioinformatics/btu083
10.3389/fpls.2018.00519
10.1038/nrg3920
10.1093/nar/gkp1117
10.1105/tpc.16.00751
10.1016/j.ab.2013.05.024
10.1093/bioinformatics/bty522
ContentType Journal Article
Copyright The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2020
The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Copyright_xml – notice: The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com 2020
– notice: The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
– notice: The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
DBID AAYXX
CITATION
NPM
7QO
7SC
8FD
FR3
JQ2
K9.
L7M
L~C
L~D
P64
RC3
7X8
DOI 10.1093/bib/bbz041
DatabaseName CrossRef
PubMed
Biotechnology Research Abstracts
Computer and Information Systems Abstracts
Technology Research Database
Engineering Research Database
ProQuest Computer Science Collection
ProQuest Health & Medical Complete (Alumni)
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Biotechnology and BioEngineering Abstracts
Genetics Abstracts
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Genetics Abstracts
Biotechnology Research Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Health & Medical Complete (Alumni)
Engineering Research Database
Advanced Technologies Database with Aerospace
Biotechnology and BioEngineering Abstracts
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList PubMed

Genetics Abstracts
CrossRef
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1477-4054
EndPage 1057
ExternalDocumentID 31067315
10_1093_bib_bbz041
10.1093/bib/bbz041
Genre Journal Article
GroupedDBID ---
-E4
.2P
.I3
0R~
23N
2WC
36B
4.4
48X
53G
5GY
5VS
6J9
70D
8VB
AAHBH
AAIJN
AAIMJ
AAJKP
AAMDB
AAMVS
AAOGV
AAPQZ
AAPXW
AARHZ
AAUQX
AAVAP
AAVLN
ABDBF
ABEJV
ABEUO
ABGNP
ABIXL
ABNKS
ABPQP
ABPTD
ABQLI
ABWST
ABXVV
ABXZS
ABZBJ
ACGFO
ACGFS
ACGOD
ACIWK
ACPRK
ACUFI
ACUHS
ACUXJ
ACYTK
ADBBV
ADEYI
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADOCK
ADPDF
ADQBN
ADRDM
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEGXH
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AEMOZ
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHQJS
AHXPO
AIAGR
AIJHB
AJEEA
AJEUX
AKHUL
AKVCP
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
ALXQX
AMNDL
ANAKG
APIBT
APWMN
ARIXL
AXUDD
AYOIW
AZVOD
BAWUL
BAYMD
BEYMZ
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
C45
CDBKE
CS3
CZ4
DAKXR
DIK
DILTD
DU5
D~K
E3Z
EAD
EAP
EAS
EBA
EBC
EBD
EBR
EBS
EBU
EE~
EMB
EMK
EMOBN
EST
ESX
F5P
F9B
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HW0
HZ~
IOX
J21
JXSIZ
K1G
KBUDW
KOP
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
P2P
PAFKI
PEELM
PQQKQ
Q1.
Q5Y
QWB
RD5
RPM
RUSNO
RW1
RXO
SV3
TEORI
TH9
TJP
TLC
TOX
TR2
TUS
W8F
WOQ
X7H
YAYTL
YKOAZ
YXANX
ZKX
ZL0
~91
AAYXX
AHGBF
CITATION
NPM
7QO
7SC
8FD
FR3
JQ2
K9.
L7M
L~C
L~D
P64
RC3
7X8
ID FETCH-LOGICAL-c411t-5c3473f7a0b6d2d41993e2599371232f297801cff1f67e4f769c5445f7e3526c3
IEDL.DBID TOX
ISSN 1467-5463
1477-4054
IngestDate Thu Jul 10 22:31:59 EDT 2025
Mon Jun 30 08:53:05 EDT 2025
Mon Jul 21 06:04:59 EDT 2025
Thu Apr 24 22:51:18 EDT 2025
Tue Jul 01 03:39:27 EDT 2025
Wed Apr 02 07:02:00 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Keywords automated modeling
bioinformatics
data clustering
sequence analysis
integrated platform
machine learning
biomedical data mining
feature selection
Language English
License This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)
https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model
The Author(s) 2019. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c411t-5c3473f7a0b6d2d41993e2599371232f297801cff1f67e4f769c5445f7e3526c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0001-8031-9086
0000-0001-5216-3213
PMID 31067315
PQID 2429011102
PQPubID 26846
PageCount 11
ParticipantIDs proquest_miscellaneous_2231894051
proquest_journals_2429011102
pubmed_primary_31067315
crossref_citationtrail_10_1093_bib_bbz041
crossref_primary_10_1093_bib_bbz041
oup_primary_10_1093_bib_bbz041
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2020-05-21
PublicationDateYYYYMMDD 2020-05-21
PublicationDate_xml – month: 05
  year: 2020
  text: 2020-05-21
  day: 21
PublicationDecade 2020
PublicationPlace England
PublicationPlace_xml – name: England
– name: Oxford
PublicationTitle Briefings in bioinformatics
PublicationTitleAlternate Brief Bioinform
PublicationYear 2020
Publisher Oxford University Press
Oxford Publishing Limited (England)
Publisher_xml – name: Oxford University Press
– name: Oxford Publishing Limited (England)
References Li (2020051819282474700_ref57) 2018; 34
He (2020051819282474700_ref46) 2018; 12
Feng (2020051819282474700_ref58) 2013; 442
Song (2020051819282474700_ref66) 2018; 9
Song (2020051819282474700_ref26) 2018; 443
Chou (2020051819282474700_ref17) 2018; 46
Jain (2020051819282474700_ref48) 1999; 31
Cao (2020051819282474700_ref32) 2013; 29
Yan (2020051819282474700_ref5) 2017; 79
Guo (2020051819282474700_ref22) 2014; 30
Chen (2020051819282474700_ref12) 1834; 2013
Libbrecht (2020051819282474700_ref39) 2015; 16
Zhang (2020051819282474700_ref19) 2017; 18
Shen (2020051819282474700_ref7) 2007; 104
Chou (2020051819282474700_ref54) 2007; 370
Liu (2020051819282474700_ref25) 2015; 31
Frey (2020051819282474700_ref53) 2007; 315
Zhou (2020051819282474700_ref45) 2016; 44
Shen (2020051819282474700_ref31) 2008; 373
Rokach (2020051819282474700_ref49) 2005
Bhasin (2020051819282474700_ref9) 2004; 279
Agris (2020051819282474700_ref62) 2008; 9
Du (2020051819282474700_ref67) 2015; 14
Liu (2020051819282474700_ref15) 2015; 10
Lopez (2020051819282474700_ref55) 2017; 527
Chou (2020051819282474700_ref3) 1978; 47
Liu (2020051819282474700_ref16) 2016; 32
Chen (2020051819282474700_ref13) 2015; 16
McCulloch (2020051819282474700_ref42) 1943; 5
Liu (2020051819282474700_ref2) 2017
Sun (2020051819282474700_ref60) 2016; 44
David (2020051819282474700_ref61) 2017; 29
Du (2020051819282474700_ref33) 2014; 15
Toronen (2020051819282474700_ref1) 2018; 46
Jain (2020051819282474700_ref50) 2010; 31
Liu (2020051819282474700_ref56) 2018; 34
Chen (2020051819282474700_ref4) 2018; 19
Cao (2020051819282474700_ref6) 2015; 31
Wang (2020051819282474700_ref36) 2017; 33
Chou (2020051819282474700_ref28) 2011; 273
Cheng (2020051819282474700_ref51) 1995; 17
Freedman (2020051819282474700_ref44) 2006; 48
Song (2020051819282474700_ref27) 2018
Larranaga (2020051819282474700_ref38) 2006; 7
Xiao (2020051819282474700_ref34) 2015; 31
Chen (2020051819282474700_ref37) 2018; 34
Liu (2020051819282474700_ref24) 2015; 43
Alexandrov (2020051819282474700_ref63) 2006; 21
Rao (2020051819282474700_ref30) 2011; 39
He (2020051819282474700_ref47) 2018; 35
Li (2020051819282474700_ref29) 2006; 34
Xuan (2020051819282474700_ref59) 2018; 46
Liu (2020051819282474700_ref20) 2017; 33
Zuo (2020051819282474700_ref35) 2017; 33
Yan (2020051819282474700_ref18) 2016; 17
Chen (2020051819282474700_ref23) 2013; 41
Ester (2020051819282474700_ref52)
Chen (2020051819282474700_ref14) 2011; 6
Zhang (2020051819282474700_ref65) 2018; 550
Rottig (2020051819282474700_ref10) 2010; 6
Motorin (2020051819282474700_ref64) 2010; 38
Altman (2020051819282474700_ref43) 1992; 46
Breiman (2020051819282474700_ref41) 2001; 45
Chou (2020051819282474700_ref8) 2008; 3
Cortes (2020051819282474700_ref40) 1995; 20
Song (2020051819282474700_ref11) 2010; 26
Lin (2020051819282474700_ref21) 2014; 42
Chen (2020051819282474700_ref68) 2018; 16
References_xml – volume: 6
  start-page: e1000636
  year: 2010
  ident: 2020051819282474700_ref10
  article-title: Combining structure and sequence information allows automated prediction of substrate specificities within enzyme families
  publication-title: PLoS Comput Biol
  doi: 10.1371/journal.pcbi.1000636
– volume: 33
  start-page: 2756
  year: 2017
  ident: 2020051819282474700_ref36
  article-title: POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx302
– volume: 43
  start-page: W65
  year: 2015
  ident: 2020051819282474700_ref24
  article-title: Pse-in-one: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkv458
– volume: 31
  start-page: 279
  year: 2015
  ident: 2020051819282474700_ref6
  article-title: Rcpi: R/bioconductor package to generate various descriptors of proteins, compounds and their interactions
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu624
– volume: 7
  start-page: 86
  year: 2006
  ident: 2020051819282474700_ref38
  article-title: Machine learning in bioinformatics
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbk007
– volume: 17
  start-page: 88
  year: 2016
  ident: 2020051819282474700_ref18
  article-title: A comprehensive comparative review of sequence-based predictors of DNA- and RNA-binding residues
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbv023
– volume: 19
  start-page: 231
  year: 2018
  ident: 2020051819282474700_ref4
  article-title: A comprehensive review and comparison of different computational methods for protein remote homology detection
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbw108
– volume: 29
  start-page: 960
  year: 2013
  ident: 2020051819282474700_ref32
  article-title: propy: a tool to generate various modes of Chou’s PseAAC
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btt072
– volume: 46
  start-page: D296
  year: 2018
  ident: 2020051819282474700_ref17
  article-title: miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkx1067
– volume: 20
  start-page: 273
  year: 1995
  ident: 2020051819282474700_ref40
  article-title: Support-vector networks
  publication-title: Mach Learn
  doi: 10.1007/BF00994018
– volume: 79
  start-page: 1
  year: 2017
  ident: 2020051819282474700_ref5
  article-title: Protein fold recognition based on sparse representation based classification
  publication-title: Artif Intell Med
  doi: 10.1016/j.artmed.2017.03.006
– volume: 5
  start-page: 115
  year: 1943
  ident: 2020051819282474700_ref42
  article-title: A logical calculus of the ideas immanent in nervous activity
  publication-title: Bull Math Biophys
  doi: 10.1007/BF02478259
– volume: 47
  start-page: 45
  year: 1978
  ident: 2020051819282474700_ref3
  article-title: Prediction of the secondary structure of proteins from their amino acid sequence
  publication-title: Adv Enzymol Relat Areas Mol Biol
– volume: 12
  start-page: 44
  year: 2018
  ident: 2020051819282474700_ref46
  article-title: 70ProPred: a predictor for discovering sigma70 promoters based on combining multiple features
  publication-title: BMC Syst Biol
  doi: 10.1186/s12918-018-0570-1
– volume: 9
  start-page: 629
  year: 2008
  ident: 2020051819282474700_ref62
  article-title: Bringing order to translation: the contributions of transfer RNA anticodon-domain modifications
  publication-title: EMBO Rep
  doi: 10.1038/embor.2008.104
– volume: 33
  start-page: 35
  year: 2017
  ident: 2020051819282474700_ref20
  article-title: iRSpot-EL: identify recombination spots with an ensemble learning approach
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw539
– volume: 17
  start-page: 790
  year: 1995
  ident: 2020051819282474700_ref51
  article-title: Mean shift, mode seeking, and clustering
  publication-title: IEEE Trans Pattern Anal Mach Intell
  doi: 10.1109/34.400568
– volume: 14
  start-page: 227
  year: 2015
  ident: 2020051819282474700_ref67
  article-title: Lysine malonylation is elevated in type 2 diabetic mouse models and enriched in metabolic associated proteins
  publication-title: Mol Cell Proteomics
  doi: 10.1074/mcp.M114.041947
– year: 2017
  ident: 2020051819282474700_ref2
  article-title: BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches
  publication-title: Brief Bioinform
– volume: 31
  start-page: 651
  year: 2010
  ident: 2020051819282474700_ref50
  article-title: Data clustering: 50 years beyond K-means
  publication-title: Pattern Recognit Lett
  doi: 10.1016/j.patrec.2009.09.011
– volume: 32
  start-page: 362
  year: 2016
  ident: 2020051819282474700_ref16
  article-title: iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btv604
– volume: 34
  start-page: 2499
  year: 2018
  ident: 2020051819282474700_ref37
  article-title: iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty140
– volume: 16
  start-page: 640
  year: 2015
  ident: 2020051819282474700_ref13
  article-title: Towards more accurate prediction of ubiquitination sites: a comprehensive review of current methods, tools and features
  publication-title: Brief Bioinform
  doi: 10.1093/bib/bbu031
– volume: 42
  start-page: 12961
  year: 2014
  ident: 2020051819282474700_ref21
  article-title: iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gku1019
– volume: 104
  start-page: 4337
  year: 2007
  ident: 2020051819282474700_ref7
  article-title: Predicting protein–protein interactions based only on sequences information
  publication-title: Proc Natl Acad Sci U S A
  doi: 10.1073/pnas.0607879104
– volume: 26
  start-page: 752
  year: 2010
  ident: 2020051819282474700_ref11
  article-title: Cascleave: towards more accurate prediction of caspase substrate cleavage sites
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btq043
– volume: 6
  start-page: e22930
  year: 2011
  ident: 2020051819282474700_ref14
  article-title: Prediction of ubiquitination sites by using the composition of k-spaced amino acid pairs
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0022930
– year: 2018
  ident: 2020051819282474700_ref27
  article-title: iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites
  publication-title: Brief Bioinform
– volume: 273
  start-page: 236
  year: 2011
  ident: 2020051819282474700_ref28
  article-title: Some remarks on protein attribute prediction and pseudo amino acid composition
  publication-title: J Theor Biol
  doi: 10.1016/j.jtbi.2010.12.024
– volume: 370
  start-page: 1
  year: 2007
  ident: 2020051819282474700_ref54
  article-title: Recent progress in protein subcellular location prediction
  publication-title: Anal Biochem
  doi: 10.1016/j.ab.2007.07.006
– volume: 46
  start-page: W84
  year: 2018
  ident: 2020051819282474700_ref1
  article-title: PANNZER2: a rapid functional annotation web server
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gky350
– volume: 18
  start-page: 1856
  year: 2017
  ident: 2020051819282474700_ref19
  article-title: PSFM-DBT: identifying DNA-binding proteins by combing position specific frequency matrix and distance-bigram transformation
  publication-title: Int J Mol Sci
  doi: 10.3390/ijms18091856
– volume: 315
  start-page: 972
  year: 2007
  ident: 2020051819282474700_ref53
  article-title: Clustering by passing messages between data points
  publication-title: Science
  doi: 10.1126/science.1136800
– volume: 44
  start-page: D259
  year: 2016
  ident: 2020051819282474700_ref60
  article-title: RMBase: a resource for decoding the landscape of RNA modifications from high-throughput sequencing data
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkv1036
– volume: 550
  start-page: 41
  year: 2018
  ident: 2020051819282474700_ref65
  article-title: Accurate RNA 5-methylcytosine site prediction based on heuristic physical–chemical properties reduction and classifier ensemble
  publication-title: Anal Biochem
  doi: 10.1016/j.ab.2018.03.027
– volume: 31
  start-page: 1857
  year: 2015
  ident: 2020051819282474700_ref34
  article-title: protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btv042
– volume: 35
  start-page: 593
  year: 2018
  ident: 2020051819282474700_ref47
  article-title: 4mCPred: machine learning methods for DNA N4-methylcytosine sites prediction
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty668
– volume: 3
  start-page: 153
  year: 2008
  ident: 2020051819282474700_ref8
  article-title: Cell-PLoc: a package of web servers for predicting subcellular localization of proteins in various organisms
  publication-title: Nat Protoc
  doi: 10.1038/nprot.2007.494
– volume: 15
  start-page: 3495
  year: 2014
  ident: 2020051819282474700_ref33
  article-title: PseAAC-general: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets
  publication-title: Int J Mol Sci
  doi: 10.3390/ijms15033495
– volume: 443
  start-page: 125
  year: 2018
  ident: 2020051819282474700_ref26
  article-title: PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework
  publication-title: J Theor Biol
  doi: 10.1016/j.jtbi.2018.01.023
– volume: 33
  start-page: 122
  year: 2017
  ident: 2020051819282474700_ref35
  article-title: PseKRAAC: a flexible web server for generating pseudo K-tuple reduced amino acids composition
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btw564
– volume: 46
  start-page: D327
  year: 2018
  ident: 2020051819282474700_ref59
  article-title: RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkx934
– volume: 46
  start-page: 175
  year: 1992
  ident: 2020051819282474700_ref43
  article-title: An Introduction to kernel and nearest-neighbor nonparametric regression
  publication-title: Am Stat
  doi: 10.1080/00031305.1992.10475879
– volume: 34
  start-page: W32
  year: 2006
  ident: 2020051819282474700_ref29
  article-title: PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkl305
– volume: 34
  start-page: 33
  year: 2018
  ident: 2020051819282474700_ref56
  article-title: iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btx579
– volume: 16
  start-page: 451
  year: 2018
  ident: 2020051819282474700_ref68
  article-title: Integration of a deep learning classifier with a random forest approach for predicting malonylation sites
  publication-title: Genomics Proteomics Bioinformatics
  doi: 10.1016/j.gpb.2018.08.004
– volume: 44
  start-page: e91
  year: 2016
  ident: 2020051819282474700_ref45
  article-title: SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkw104
– volume: 373
  start-page: 386
  year: 2008
  ident: 2020051819282474700_ref31
  article-title: PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition
  publication-title: Anal Biochem
  doi: 10.1016/j.ab.2007.10.012
– volume: 45
  start-page: 5
  year: 2001
  ident: 2020051819282474700_ref41
  article-title: Random forests
  publication-title: Mach Learn
  doi: 10.1023/A:1010933404324
– volume: 21
  start-page: 87
  year: 2006
  ident: 2020051819282474700_ref63
  article-title: Rapid tRNA decay can result from lack of nonessential modifications
  publication-title: Mol Cell
  doi: 10.1016/j.molcel.2005.10.036
– volume: 31
  start-page: 1307
  year: 2015
  ident: 2020051819282474700_ref25
  article-title: repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu820
– volume: 279
  start-page: 23262
  year: 2004
  ident: 2020051819282474700_ref9
  article-title: Classification of nuclear receptors based on amino acid composition and dipeptide composition
  publication-title: J Biol Chem
  doi: 10.1074/jbc.M401932200
– volume: 10
  start-page: e0121501
  year: 2015
  ident: 2020051819282474700_ref15
  article-title: Identification of real microRNA precursors with a pseudo structure status composition approach
  publication-title: PLoS One
  doi: 10.1371/journal.pone.0121501
– volume: 527
  start-page: 24
  year: 2017
  ident: 2020051819282474700_ref55
  article-title: SucStruct: prediction of succinylated lysine residues by using structural properties of amino acids
  publication-title: Anal Biochem
  doi: 10.1016/j.ab.2017.03.021
– volume: 31
  start-page: 264
  year: 1999
  ident: 2020051819282474700_ref48
  article-title: Data clustering: a review
  publication-title: ACM Comput Surv
  doi: 10.1145/331499.331504
– volume: 2013
  start-page: 1461
  year: 1834
  ident: 2020051819282474700_ref12
  article-title: hCKSAAP_UbSite: improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties
  publication-title: Biochim Biophys Acta
– volume: 39
  start-page: W385
  year: 2011
  ident: 2020051819282474700_ref30
  article-title: Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkr284
– volume: 48
  start-page: 315
  year: 2006
  ident: 2020051819282474700_ref44
  article-title: Statistical models: theory and practice
  publication-title: Technometrics
  doi: 10.1198/tech.2006.s403
– volume: 41
  start-page: e68
  year: 2013
  ident: 2020051819282474700_ref23
  article-title: iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gks1450
– start-page: 226
  volume-title: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining
  ident: 2020051819282474700_ref52
– start-page: 321
  volume-title: Data Mining and Knowledge Discovery Handbook
  year: 2005
  ident: 2020051819282474700_ref49
  doi: 10.1007/0-387-25465-X_15
– volume: 30
  start-page: 1522
  year: 2014
  ident: 2020051819282474700_ref22
  article-title: iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btu083
– volume: 9
  start-page: 519
  year: 2018
  ident: 2020051819282474700_ref66
  article-title: Transcriptome-wide annotation of m(5)C RNA modifications using machine learning
  publication-title: Front Plant Sci
  doi: 10.3389/fpls.2018.00519
– volume: 16
  start-page: 321
  year: 2015
  ident: 2020051819282474700_ref39
  article-title: Machine learning applications in genetics and genomics
  publication-title: Nat Rev Genet
  doi: 10.1038/nrg3920
– volume: 38
  start-page: 1415
  year: 2010
  ident: 2020051819282474700_ref64
  article-title: 5-methylcytosine in RNA: detection, enzymatic formation and biological functions
  publication-title: Nucleic Acids Res
  doi: 10.1093/nar/gkp1117
– volume: 29
  start-page: 445
  year: 2017
  ident: 2020051819282474700_ref61
  article-title: Transcriptome-wide mapping of RNA 5-Methylcytosine in Arabidopsis mRNAs and noncoding RNAs
  publication-title: Plant Cell
  doi: 10.1105/tpc.16.00751
– volume: 442
  start-page: 118
  year: 2013
  ident: 2020051819282474700_ref58
  article-title: iHSP-PseRAAAC: identifying the heat shock protein families using pseudo reduced amino acid alphabet composition
  publication-title: Anal Biochem
  doi: 10.1016/j.ab.2013.05.024
– volume: 34
  start-page: 4223
  year: 2018
  ident: 2020051819282474700_ref57
  article-title: Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/bty522
SSID ssj0020781
Score 2.65866
Snippet Abstract With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and...
With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational...
SourceID proquest
pubmed
crossref
oup
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1047
SubjectTerms Algorithms
Amino acid sequence
Bioinformatics
Clustering
Computer applications
Deoxyribonucleic acid
DNA
Engineering education
Feature extraction
Gene sequencing
Internet
Learning algorithms
Machine learning
Nucleotide sequence
Proteins
Reduction
Ribonucleic acid
RNA
Software
Toolkits
Title iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
URI https://www.ncbi.nlm.nih.gov/pubmed/31067315
https://www.proquest.com/docview/2429011102
https://www.proquest.com/docview/2231894051
Volume 21
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwhZ1LSwMxEMeDFAQv4ttqLRG9CA3dd3a9FbUUwQrSQm_LbjaRQrst7fag38Pv60yyXSkWvea1sJMw_5CZ3xByawseWGEWsjSQAQMXIBl4QZsp7nNwF1aidG3Al37QG3rPI39UBtEstzzhR247HaftNP20dHo6eF8k5A9eR9W1CnE1JoeIM4S7ryGkG1M33M5GKtsvRak9S_eA7JeSkHaMDQ_JjsyPyK4pEvlxTL7GmoF6T5OcVnCHjM4nSYGCE5ozOpVFwnQBCLmg0EqV1MBOKn9wgy061YGTkpWVIt5hqiGSmDWwJg62zhR97Hda9K3f0R0a5TDO6TrqmmJU6QkZdp8GDz1WFlNgwrPtgvnC9bireGKlQeZkHgbuSbj7IA8PVJVyEEVkC6VsFXDpKR5EAkE9iktE6Av3lNTyWS7PCXWVk0WOULBu5vEI7lzKF6EXSZmoMA2jOrlb_-tYlKRxLHgxic2LtxuDXWJjlzq5qcbODV9j66gmmOzPAY21NePyEC5jUB-YWAsSqk6uq244PvgmkuRytoIxoG_DCFQrLHFmdkH1GRfxeq7tX_z39Uuy5-A13PKZYzdIrVis5BVolSJt6q36DfCD5cc
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=iLearn%3A+an+integrated+platform+and+meta-learner+for+feature+engineering%2C+machine-learning+analysis+and+modeling+of+DNA%2C+RNA+and+protein+sequence+data&rft.jtitle=Briefings+in+bioinformatics&rft.au=Chen%2C+Zhen&rft.au=Zhao%2C+Pei&rft.au=Li%2C+Fuyi&rft.au=Marquez-Lago%2C+Tatiana+T&rft.date=2020-05-21&rft.issn=1467-5463&rft.eissn=1477-4054&rft.volume=21&rft.issue=3&rft.spage=1047&rft.epage=1057&rft_id=info:doi/10.1093%2Fbib%2Fbbz041&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bib_bbz041
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1467-5463&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1467-5463&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1467-5463&client=summon