Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, M...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemical information and modeling Vol. 58; no. 1; pp. 27 - 35
Main Authors Jaeger, Sabrina, Fulle, Simone, Turk, Samo
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 22.01.2018
Subjects
Online AccessGet full text
ISSN1549-9596
1549-960X
1549-960X
DOI10.1021/acs.jcim.7b00616

Cover

Loading…
Abstract Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as a reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities.
AbstractList Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as a reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities.
Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as a reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities.Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as a reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities.
Author Turk, Samo
Jaeger, Sabrina
Fulle, Simone
Author_xml – sequence: 1
  givenname: Sabrina
  orcidid: 0000-0003-1144-7468
  surname: Jaeger
  fullname: Jaeger, Sabrina
– sequence: 2
  givenname: Simone
  orcidid: 0000-0002-7646-5889
  surname: Fulle
  fullname: Fulle, Simone
  email: fulle@bio.mx
– sequence: 3
  givenname: Samo
  orcidid: 0000-0003-2044-7670
  surname: Turk
  fullname: Turk, Samo
  email: turk@bio.mx
BackLink https://www.ncbi.nlm.nih.gov/pubmed/29268609$$D View this record in MEDLINE/PubMed
BookMark eNp9kc9PwjAUxxuDkR9692SWePEg-Np13Z43QhRJIF4k8daUrUjJ1mG7YfzvHQIXEj216ft8v_m-frukZUurCbmmMKDA6INK_WCdmmIQLwAEFWekQyOOfRTw3jreIxRt0vV-DRCGKNgFaTNkIhGAHTKelTnb6vQxmFtfb7TbGq-zYKbSlbE6mGrlrLEfwXCzcWXzGHyZahWMVrowqcqDia1qU5nSXpLzpcq9vjqcPTJ_fnobvfSnr-PJaDjtK87Dqs-BM-QqVJpH2ZICsgwigGUseBxliCEkirIw0hQZRrEOkbIsQp4JqmJYZGGP3O19mziftfaVLIxPdZ4rq8vaS4oxohAM4wa9PUHXZe1sk04yAIZJwiFpqJsDVS8KncmNM4Vy3_L4RQ0g9kDqSu-dXsrUVGq3c-WUySUFuetCNl3IXRfy0EUjhBPh0fsfyf1e8js5pv0T_wETipo_
CitedBy_id crossref_primary_10_1021_jacs_1c12005
crossref_primary_10_1038_s41598_021_90259_7
crossref_primary_10_3390_ijms25147982
crossref_primary_10_1002_jcc_27469
crossref_primary_10_1016_j_fmre_2024_02_011
crossref_primary_10_1093_bib_bbac338
crossref_primary_10_18632_aging_203887
crossref_primary_10_1016_j_fuel_2024_133462
crossref_primary_10_1021_jacs_2c08997
crossref_primary_10_1063_5_0014828
crossref_primary_10_3389_fphar_2021_772296
crossref_primary_10_1016_j_mencom_2021_11_003
crossref_primary_10_1093_nar_gkaa895
crossref_primary_10_1016_j_future_2024_07_033
crossref_primary_10_1021_acs_jcim_9b00749
crossref_primary_10_1093_bib_bbab365
crossref_primary_10_1016_j_drudis_2020_01_020
crossref_primary_10_1038_s41597_023_02612_2
crossref_primary_10_1016_j_csl_2020_101104
crossref_primary_10_1038_s41598_019_38746_w
crossref_primary_10_1080_10643389_2025_2469868
crossref_primary_10_1093_bib_bbab239
crossref_primary_10_3389_fchem_2023_1292027
crossref_primary_10_54097_hset_v45i_7582
crossref_primary_10_1039_D1SC02783B
crossref_primary_10_1186_s12859_024_05847_x
crossref_primary_10_1016_j_coche_2022_100840
crossref_primary_10_1111_cbdd_14092
crossref_primary_10_1073_pnas_1803294115
crossref_primary_10_1007_s10118_024_3237_y
crossref_primary_10_1155_2021_7181815
crossref_primary_10_1021_acscentsci_4c01991
crossref_primary_10_1093_bioinformatics_btac837
crossref_primary_10_1021_acs_jcim_2c00060
crossref_primary_10_1186_s13321_021_00574_4
crossref_primary_10_1016_j_jmgm_2022_108283
crossref_primary_10_1007_s00894_022_05373_8
crossref_primary_10_1021_acs_jcim_4c00318
crossref_primary_10_1016_j_arr_2024_102276
crossref_primary_10_3847_1538_4357_ad004c
crossref_primary_10_1002_cmdc_202100418
crossref_primary_10_1016_j_eswa_2025_126637
crossref_primary_10_1007_s42979_021_00948_3
crossref_primary_10_1016_j_procs_2023_10_390
crossref_primary_10_1088_2058_9565_ac6825
crossref_primary_10_1016_j_compbiolchem_2024_108320
crossref_primary_10_34133_hds_0098
crossref_primary_10_1039_D2TA07660H
crossref_primary_10_1039_D2DD00107A
crossref_primary_10_3389_fphar_2022_864412
crossref_primary_10_1021_acsomega_3c01218
crossref_primary_10_1016_j_genrep_2020_100869
crossref_primary_10_1021_acs_jcim_3c02083
crossref_primary_10_1002_adfm_202315177
crossref_primary_10_1039_D1EA00090J
crossref_primary_10_1002_smtd_202300214
crossref_primary_10_1007_s10822_023_00533_1
crossref_primary_10_1109_TCBB_2021_3084397
crossref_primary_10_3847_2041_8213_acb648
crossref_primary_10_1021_acs_jcim_4c00422
crossref_primary_10_1109_TCBB_2021_3069040
crossref_primary_10_1093_bib_bbab449
crossref_primary_10_1016_j_compbiolchem_2024_108137
crossref_primary_10_1186_s13040_024_00419_4
crossref_primary_10_1021_acs_jcim_3c02070
crossref_primary_10_1021_acsmedchemlett_1c00439
crossref_primary_10_1039_D1CC07035E
crossref_primary_10_1038_s41598_024_51940_9
crossref_primary_10_1016_j_bpc_2022_106891
crossref_primary_10_1016_j_jbi_2020_103579
crossref_primary_10_1016_j_jmb_2025_168983
crossref_primary_10_1021_acs_jcim_0c01097
crossref_primary_10_1021_acs_jcim_4c00310
crossref_primary_10_1016_j_chroma_2021_462119
crossref_primary_10_1093_bib_bbab317
crossref_primary_10_1021_acs_jctc_4c00961
crossref_primary_10_1063_5_0205433
crossref_primary_10_1080_17460441_2018_1547278
crossref_primary_10_1002_aic_18068
crossref_primary_10_1016_j_medj_2024_07_026
crossref_primary_10_1021_acsomega_4c06113
crossref_primary_10_1021_acsomega_4c07689
crossref_primary_10_2131_jts_49_249
crossref_primary_10_1039_D4NP00009A
crossref_primary_10_1002_aic_18185
crossref_primary_10_3390_cells13090771
crossref_primary_10_1016_j_ymeth_2024_01_017
crossref_primary_10_1016_j_cbi_2021_109766
crossref_primary_10_1093_bib_bbab593
crossref_primary_10_1039_D3DD00020F
crossref_primary_10_1016_j_compbiomed_2020_104197
crossref_primary_10_1093_bib_bbab109
crossref_primary_10_1109_TAI_2023_3254518
crossref_primary_10_1016_j_neunet_2024_107088
crossref_primary_10_3390_electronics10101143
crossref_primary_10_1016_j_tchem_2023_100035
crossref_primary_10_1021_acs_jcim_9b00721
crossref_primary_10_1016_j_cjche_2020_10_044
crossref_primary_10_1093_bib_bbab586
crossref_primary_10_1093_bib_bbad400
crossref_primary_10_1021_acs_jcim_4c01862
crossref_primary_10_1021_jacs_9b11569
crossref_primary_10_1080_07391102_2021_1905559
crossref_primary_10_3390_biom13091372
crossref_primary_10_1016_j_simpa_2024_100623
crossref_primary_10_1016_j_fluid_2020_112829
crossref_primary_10_1016_j_knosys_2022_109028
crossref_primary_10_1093_bioinformatics_btac538
crossref_primary_10_1021_acsenergylett_2c01535
crossref_primary_10_1002_prep_202200264
crossref_primary_10_1093_bioinformatics_btz411
crossref_primary_10_1021_acscombsci_0c00169
crossref_primary_10_1016_j_jmgm_2022_108344
crossref_primary_10_1021_acsomega_4c06163
crossref_primary_10_1038_s41598_020_73105_0
crossref_primary_10_1002_anie_202101986
crossref_primary_10_1007_s11030_024_10905_w
crossref_primary_10_1039_D0CP03596C
crossref_primary_10_1371_journal_pone_0300906
crossref_primary_10_3389_fphar_2024_1441587
crossref_primary_10_1093_bioinformatics_btad519
crossref_primary_10_1038_s41467_023_37572_z
crossref_primary_10_1016_j_compbiomed_2024_108037
crossref_primary_10_3390_ijms252111385
crossref_primary_10_3847_2041_8213_ac194b
crossref_primary_10_1021_acs_jcim_1c00584
crossref_primary_10_1093_bib_bbab514
crossref_primary_10_1016_j_ijbiomac_2024_136678
crossref_primary_10_3389_fgene_2022_859188
crossref_primary_10_1021_acs_jcim_8b00803
crossref_primary_10_1021_acs_jcim_8b00801
crossref_primary_10_1016_j_cbi_2025_111372
crossref_primary_10_3390_molecules27123931
crossref_primary_10_1021_acs_jcim_3c01286
crossref_primary_10_1016_j_fluid_2022_113531
crossref_primary_10_1039_D3NR04944B
crossref_primary_10_1093_bioinformatics_btac550
crossref_primary_10_1021_acs_jcim_0c01366
crossref_primary_10_3389_fchem_2023_1239467
crossref_primary_10_1093_bib_bbaa218
crossref_primary_10_1021_acs_jcim_2c00229
crossref_primary_10_1002_wcms_1603
crossref_primary_10_1021_acs_jcim_1c01341
crossref_primary_10_1038_s41598_024_61124_0
crossref_primary_10_1186_s12859_024_05915_2
crossref_primary_10_1039_D4VA00072B
crossref_primary_10_1021_acs_est_4c11282
crossref_primary_10_1186_s13321_020_00430_x
crossref_primary_10_1021_acs_jpca_2c08821
crossref_primary_10_1093_bioinformatics_btz307
crossref_primary_10_1021_acs_jcim_4c00056
crossref_primary_10_1002_aisy_202300798
crossref_primary_10_1016_j_compbiomed_2023_107911
crossref_primary_10_1002_jcc_26786
crossref_primary_10_1557_s43578_022_00628_9
crossref_primary_10_1021_jacs_9b05895
crossref_primary_10_1002_ange_202008366
crossref_primary_10_1021_acs_jpca_4c03580
crossref_primary_10_3390_ijms22189983
crossref_primary_10_1039_D3CC01570J
crossref_primary_10_1186_s12859_023_05369_y
crossref_primary_10_1016_j_drudis_2020_03_003
crossref_primary_10_1038_s41540_022_00226_9
crossref_primary_10_1093_bfgp_elac004
crossref_primary_10_1016_j_bbadis_2024_167263
crossref_primary_10_1016_j_drudis_2022_05_005
crossref_primary_10_1016_j_drudis_2022_103351
crossref_primary_10_1007_s11030_024_10839_3
crossref_primary_10_1021_acs_jcim_4c00157
crossref_primary_10_1016_j_chemosphere_2022_136447
crossref_primary_10_1021_acs_molpharmaceut_4c00086
crossref_primary_10_3389_fmolb_2022_963912
crossref_primary_10_3389_fmolb_2022_872086
crossref_primary_10_1016_j_ygeno_2020_11_009
crossref_primary_10_3389_fgene_2021_738274
crossref_primary_10_1016_j_physrep_2021_08_002
crossref_primary_10_1186_s13321_019_0328_9
crossref_primary_10_1021_acsomega_7b02045
crossref_primary_10_1186_s13321_024_00806_3
crossref_primary_10_3390_molecules25153446
crossref_primary_10_1007_s12539_020_00376_6
crossref_primary_10_1016_j_ymeth_2024_08_003
crossref_primary_10_1016_j_fluid_2023_113734
crossref_primary_10_1016_j_jmgm_2023_108564
crossref_primary_10_1246_bcsj_20200220
crossref_primary_10_1093_bib_bbab503
crossref_primary_10_1016_j_ijms_2022_116817
crossref_primary_10_1021_acsomega_4c07078
crossref_primary_10_1088_1361_648X_ac3e1e
crossref_primary_10_3389_fmed_2022_916481
crossref_primary_10_1021_acs_jmedchem_3c01893
crossref_primary_10_1021_acsomega_0c03866
crossref_primary_10_1186_s13321_020_00473_0
crossref_primary_10_1016_j_aichem_2024_100064
crossref_primary_10_1002_minf_202000212
crossref_primary_10_1186_s13321_022_00600_z
crossref_primary_10_3389_frai_2021_757780
crossref_primary_10_1016_j_jmgm_2024_108851
crossref_primary_10_1186_s13321_024_00916_y
crossref_primary_10_1109_ACCESS_2024_3368926
crossref_primary_10_3389_fphar_2021_827606
crossref_primary_10_1021_acs_chemrestox_0c00374
crossref_primary_10_3390_biom13030503
crossref_primary_10_1109_ACCESS_2024_3485553
crossref_primary_10_1093_bioinformatics_btad462
crossref_primary_10_1093_bioinformatics_btae558
crossref_primary_10_1093_bioinformatics_bty287
crossref_primary_10_1021_acs_jcim_2c00841
crossref_primary_10_1016_j_neunet_2024_106779
crossref_primary_10_3390_su162310681
crossref_primary_10_1002_adts_202100565
crossref_primary_10_1186_s13068_023_02419_8
crossref_primary_10_1002_anie_202008366
crossref_primary_10_1021_acs_jctc_2c01039
crossref_primary_10_1016_j_drudis_2021_06_009
crossref_primary_10_1021_acs_jpca_1c06152
crossref_primary_10_1093_bioinformatics_btae563
crossref_primary_10_1038_s41598_024_59933_4
crossref_primary_10_1038_s41570_021_00260_x
crossref_primary_10_7717_peerj_8864
crossref_primary_10_1016_j_trechm_2022_07_005
crossref_primary_10_1021_acs_jcim_1c01031
crossref_primary_10_1016_j_jhazmat_2024_133443
crossref_primary_10_1039_D2RE00030J
crossref_primary_10_1093_bioinformatics_btaa094
crossref_primary_10_1038_s41598_022_07608_3
crossref_primary_10_1016_j_comtox_2023_100298
crossref_primary_10_1021_acs_jcim_8b00769
crossref_primary_10_1016_j_ymeth_2020_05_014
crossref_primary_10_1515_jib_2022_0050
crossref_primary_10_1016_j_knosys_2023_111329
crossref_primary_10_1002_minf_201900131
crossref_primary_10_1016_j_chemolab_2024_105168
crossref_primary_10_1016_j_ces_2024_120111
crossref_primary_10_1016_j_cplett_2018_05_035
crossref_primary_10_3389_fphar_2022_892559
crossref_primary_10_1007_s12539_024_00632_z
crossref_primary_10_1186_s13321_023_00716_w
crossref_primary_10_1016_j_compbiolchem_2023_107982
crossref_primary_10_3390_ph14080758
crossref_primary_10_1016_j_eswa_2023_121016
crossref_primary_10_1063_5_0201522
crossref_primary_10_1021_acs_chemmater_3c02203
crossref_primary_10_1021_acs_jcim_8b00671
crossref_primary_10_3390_molecules26154678
crossref_primary_10_1039_D1CP04422B
crossref_primary_10_2174_0109298673266470231023110841
crossref_primary_10_1186_s13321_022_00650_3
crossref_primary_10_1021_acs_jcim_1c00086
crossref_primary_10_1093_bioinformatics_btae594
crossref_primary_10_1002_adma_202106506
crossref_primary_10_1007_s44196_024_00561_1
crossref_primary_10_1021_acs_jcim_3c00572
crossref_primary_10_1021_acsnano_4c12350
crossref_primary_10_1021_acs_jcim_4c02161
crossref_primary_10_1093_bib_bbae281
crossref_primary_10_1021_acs_jcim_2c00765
crossref_primary_10_1039_D1RA07956E
crossref_primary_10_1002_chem_202401626
crossref_primary_10_1002_minf_202100156
crossref_primary_10_1039_D1NP00016K
crossref_primary_10_1039_D2CP03423A
crossref_primary_10_1016_j_ece_2023_08_003
crossref_primary_10_1021_acs_jcim_0c01413
crossref_primary_10_1016_j_chempr_2024_07_025
crossref_primary_10_1038_s42256_022_00463_x
crossref_primary_10_3390_ijgi8030134
crossref_primary_10_3389_fbioe_2022_1005051
crossref_primary_10_1002_solr_202301079
crossref_primary_10_1007_s11224_022_01960_w
crossref_primary_10_1109_TCBB_2024_3434340
crossref_primary_10_3390_ijms23095258
crossref_primary_10_1016_j_molliq_2023_123708
crossref_primary_10_1021_acscentsci_3c01638
crossref_primary_10_1039_D3DD00119A
crossref_primary_10_1177_1535370221993422
crossref_primary_10_1016_j_jff_2023_105542
crossref_primary_10_1021_acs_jcim_3c00554
crossref_primary_10_1021_acs_jcim_3c01524
crossref_primary_10_1093_bioinformatics_btad169
crossref_primary_10_1016_j_pmatsci_2022_101043
crossref_primary_10_1016_j_ces_2024_121128
crossref_primary_10_3389_fchem_2019_00895
crossref_primary_10_1021_acsami_2c08891
crossref_primary_10_1038_s41598_023_50393_w
crossref_primary_10_1007_s10489_022_04280_y
crossref_primary_10_1093_bib_bbae298
crossref_primary_10_1016_j_ccst_2025_100374
crossref_primary_10_1016_j_matdes_2022_110735
crossref_primary_10_1186_s12859_024_05698_6
crossref_primary_10_1016_j_patter_2023_100846
crossref_primary_10_1039_D1CP02903G
crossref_primary_10_1002_wcms_1597
crossref_primary_10_1021_acs_jcim_2c00798
crossref_primary_10_1098_rsif_2017_0387
crossref_primary_10_1038_s41524_024_01261_2
crossref_primary_10_1016_j_aca_2021_338403
crossref_primary_10_1016_j_mtcomm_2023_107577
crossref_primary_10_1080_1062936X_2024_2440903
crossref_primary_10_1109_JBHI_2023_3315073
crossref_primary_10_1021_acs_jcim_2c00671
crossref_primary_10_1038_s41551_021_00819_5
crossref_primary_10_1038_s42256_022_00457_9
crossref_primary_10_1021_acs_jcim_9b00358
crossref_primary_10_1007_s41060_022_00371_8
crossref_primary_10_1371_journal_pone_0282042
crossref_primary_10_1063_5_0131067
crossref_primary_10_1021_acsomega_3c03471
crossref_primary_10_3390_pr11051340
crossref_primary_10_1002_aic_17748
crossref_primary_10_1016_j_compbiolchem_2024_108056
crossref_primary_10_1039_D4TC04046E
crossref_primary_10_1016_j_csbj_2023_08_016
crossref_primary_10_3389_fbinf_2023_1225149
crossref_primary_10_1016_j_compbiomed_2022_106192
crossref_primary_10_1038_s41524_023_01154_w
crossref_primary_10_1007_s12257_020_0049_y
crossref_primary_10_1021_acsomega_2c05693
crossref_primary_10_1021_acsptsci_2c00193
crossref_primary_10_1021_jacs_2c11098
crossref_primary_10_1038_s42256_021_00368_1
crossref_primary_10_1021_acs_jcim_4c00512
crossref_primary_10_1039_D4DD00116H
crossref_primary_10_1016_j_ymeth_2024_11_009
crossref_primary_10_3390_molecules28124691
crossref_primary_10_1021_acs_jcim_4c00747
crossref_primary_10_1016_j_msec_2021_112553
crossref_primary_10_1016_j_ymeth_2022_07_009
crossref_primary_10_1039_D4CS00196F
crossref_primary_10_1007_s00894_023_05492_w
crossref_primary_10_1073_pnas_2220778120
crossref_primary_10_1016_j_tips_2023_04_002
crossref_primary_10_1021_acsomega_3c01078
crossref_primary_10_1039_C9SC02452B
crossref_primary_10_1021_acs_jcim_8b00286
crossref_primary_10_1016_j_compbiolchem_2022_107730
crossref_primary_10_1016_j_plantsci_2020_110527
crossref_primary_10_1038_s41598_024_75487_x
crossref_primary_10_1007_s12293_024_00414_6
crossref_primary_10_1088_2058_9565_adb3c7
crossref_primary_10_1016_j_checat_2024_101079
crossref_primary_10_1515_jib_2022_0006
crossref_primary_10_1016_j_isci_2022_104585
crossref_primary_10_1016_j_molliq_2020_114571
crossref_primary_10_1016_j_ymeth_2019_03_012
crossref_primary_10_1038_s42256_024_00821_x
crossref_primary_10_1016_j_eswa_2024_125403
crossref_primary_10_1016_j_ijbiomac_2024_133825
crossref_primary_10_1093_mutage_geac010
crossref_primary_10_1016_j_mtphys_2022_100850
crossref_primary_10_1111_cbdd_13742
crossref_primary_10_1016_j_procs_2024_09_243
crossref_primary_10_1186_s13321_019_0368_1
crossref_primary_10_1021_acs_jcim_4c00726
crossref_primary_10_1016_j_aichem_2023_100002
crossref_primary_10_1039_C9RA09211K
crossref_primary_10_1021_acs_jcim_1c00920
crossref_primary_10_1186_s13321_021_00533_z
crossref_primary_10_1021_acs_iecr_2c01473
crossref_primary_10_1021_acs_jcim_9b00798
crossref_primary_10_1039_D1SC05610G
crossref_primary_10_1093_bib_bbad115
crossref_primary_10_1016_j_compbiolchem_2022_107674
crossref_primary_10_1093_bib_bbad235
crossref_primary_10_1002_minf_202100315
crossref_primary_10_1021_acs_jcim_4c01904
crossref_primary_10_1186_s13321_023_00713_z
crossref_primary_10_1186_s13321_019_0355_6
crossref_primary_10_1039_D1TA10589B
crossref_primary_10_1021_acs_jcim_0c00726
crossref_primary_10_1186_s13321_021_00540_0
crossref_primary_10_1021_acs_jcim_0c00601
crossref_primary_10_1186_s13321_022_00608_5
crossref_primary_10_2139_ssrn_4017219
crossref_primary_10_2174_0115748936285690240101041704
crossref_primary_10_1016_j_chemolab_2021_104309
crossref_primary_10_1016_j_compbiomed_2023_107136
crossref_primary_10_1038_s41598_023_42952_y
crossref_primary_10_1016_j_compbiomed_2023_107131
crossref_primary_10_1007_s10822_024_00559_z
crossref_primary_10_1002_minf_201900170
crossref_primary_10_1038_s41598_022_23014_1
crossref_primary_10_1109_TEVC_2021_3064943
crossref_primary_10_1145_3465398
crossref_primary_10_1021_acs_jcim_1c00975
crossref_primary_10_2174_0118715206270568231129054853
crossref_primary_10_1016_j_ddtec_2020_08_004
crossref_primary_10_1016_j_ddtec_2020_08_003
crossref_primary_10_1021_acs_jcim_0c00622
crossref_primary_10_1016_j_compbiolchem_2020_107286
crossref_primary_10_1016_j_ijpharm_2024_123884
crossref_primary_10_1038_s42256_020_0209_y
crossref_primary_10_1109_TVCG_2020_3030438
crossref_primary_10_1002_celc_202300738
crossref_primary_10_1021_acsomega_1c06453
crossref_primary_10_1002_ange_202101986
crossref_primary_10_1021_acs_jpcc_1c05715
Cites_doi 10.1021/acs.jcim.6b00601
10.3389/fenvs.2015.00080
10.1101/086033
10.1021/ci100050t
10.1007/s10822-016-9938-8
10.1109/ICCV.2011.6126527
10.1002/cmdc.201700180
10.1021/ci990307l
10.1039/C4MD00216D
10.1021/acs.jcim.7b00249
10.1186/s13321-016-0148-0
10.1021/acs.jmedchem.6b01611
10.1021/ci034243x
10.1016/j.tiv.2017.02.016
10.1021/ci400187y
10.1021/jm000393c
10.1021/ci400466r
10.1093/nar/gkv1082
10.1186/s13321-017-0235-x
10.1093/nar/gkt1031
10.1371/journal.pone.0141287
10.1021/ci900161g
10.1093/nar/gkw1099
10.1145/2939672.2939785
10.1021/ci3001277
10.1093/nar/gkr777
10.1021/acs.jcim.5b00543
10.1186/1758-2946-5-26
10.1021/jm9700575
ContentType Journal Article
Copyright Copyright © 2017 American Chemical Society
Copyright American Chemical Society Jan 22, 2018
Copyright_xml – notice: Copyright © 2017 American Chemical Society
– notice: Copyright American Chemical Society Jan 22, 2018
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7SC
7SR
7U5
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
7X8
DOI 10.1021/acs.jcim.7b00616
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Computer and Information Systems Abstracts
Engineered Materials Abstracts
Solid State and Superconductivity Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Solid State and Superconductivity Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList MEDLINE
Materials Research Database
MEDLINE - Academic

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Chemistry
EISSN 1549-960X
EndPage 35
ExternalDocumentID 29268609
10_1021_acs_jcim_7b00616
a350265587
Genre Journal Article
Feature
GroupedDBID -
55A
5GY
7~N
AABXI
ABFLS
ABMVS
ABUCX
ACGFS
ACIWK
ACNCT
ACS
AEESW
AENEX
AFEFF
ALMA_UNASSIGNED_HOLDINGS
AQSVZ
D0L
DU5
EBS
ED
ED~
EJD
F5P
GNL
IH9
JG
JG~
P2P
PQEST
PQQKQ
RNS
ROL
UI2
VF5
VG9
W1F
X
---
-~X
4.4
5VS
AAYXX
ABBLG
ABJNI
ABLBI
ABQRX
ADHLV
AHGAQ
CITATION
CUPRZ
GGK
CGR
CUY
CVF
ECM
EIF
NPM
7SC
7SR
7U5
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-a443t-404294a3ae45df1092d0500f76475d99308a1235e192957e3912d594d61a70bd3
IEDL.DBID ACS
ISSN 1549-9596
1549-960X
IngestDate Fri Jul 11 00:30:34 EDT 2025
Mon Jun 30 10:53:36 EDT 2025
Mon Jul 21 06:06:30 EDT 2025
Thu Apr 24 23:11:24 EDT 2025
Tue Jul 01 03:04:34 EDT 2025
Thu Aug 27 13:42:35 EDT 2020
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a443t-404294a3ae45df1092d0500f76475d99308a1235e192957e3912d594d61a70bd3
Notes SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0002-7646-5889
0000-0003-2044-7670
0000-0003-1144-7468
OpenAccessLink https://figshare.com/articles/journal_contribution/Mol2vec_Unsupervised_Machine_Learning_Approach_with_Chemical_Intuition/5773974
PMID 29268609
PQID 2002988408
PQPubID 28739
PageCount 9
ParticipantIDs proquest_miscellaneous_1979966297
proquest_journals_2002988408
pubmed_primary_29268609
crossref_citationtrail_10_1021_acs_jcim_7b00616
crossref_primary_10_1021_acs_jcim_7b00616
acs_journals_10_1021_acs_jcim_7b00616
ProviderPackageCode JG~
55A
AABXI
GNL
VF5
7~N
VG9
W1F
ACS
AEESW
AFEFF
ABMVS
ABUCX
IH9
AQSVZ
ED~
UI2
CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2018-01-22
PublicationDateYYYYMMDD 2018-01-22
PublicationDate_xml – month: 01
  year: 2018
  text: 2018-01-22
  day: 22
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: Washington
PublicationTitle Journal of chemical information and modeling
PublicationTitleAlternate J. Chem. Inf. Model
PublicationYear 2018
Publisher American Chemical Society
Publisher_xml – name: American Chemical Society
References ref9/cit9
ref6/cit6
ref3/cit3
ref27/cit27
ref18/cit18
Srivastava N. (ref38/cit38) 2014; 15
ref11/cit11
ref25/cit25
ref16/cit16
ref29/cit29
ref39/cit39
ref14/cit14
ref8/cit8
ref5/cit5
ref31/cit31
ref2/cit2
ref43/cit43
ref34/cit34
ref37/cit37
ref28/cit28
ref40/cit40
ref20/cit20
ref17/cit17
ref10/cit10
Řehuřek R. (ref23/cit23) 2010
ref26/cit26
ref35/cit35
ref19/cit19
ref21/cit21
ref12/cit12
Nair V. (ref36/cit36) 2010
ref15/cit15
Pedregosa F. (ref32/cit32) 2011; 12
ref42/cit42
ref41/cit41
ref22/cit22
ref13/cit13
ref33/cit33
ref4/cit4
ref30/cit30
ref1/cit1
ref24/cit24
ref44/cit44
ref7/cit7
References_xml – ident: ref9/cit9
  doi: 10.1021/acs.jcim.6b00601
– ident: ref20/cit20
– ident: ref5/cit5
  doi: 10.3389/fenvs.2015.00080
– ident: ref12/cit12
  doi: 10.1101/086033
– ident: ref31/cit31
– ident: ref37/cit37
– ident: ref1/cit1
  doi: 10.1021/ci100050t
– start-page: 807
  volume-title: Proceedings of the 27th International Conference on Machine Learning (ICML-10)
  year: 2010
  ident: ref36/cit36
– ident: ref8/cit8
  doi: 10.1007/s10822-016-9938-8
– volume: 15
  start-page: 1929
  year: 2014
  ident: ref38/cit38
  publication-title: J. Mach. Learn. Res.
– ident: ref27/cit27
  doi: 10.1109/ICCV.2011.6126527
– ident: ref7/cit7
  doi: 10.1002/cmdc.201700180
– ident: ref10/cit10
– ident: ref21/cit21
  doi: 10.1021/ci990307l
– ident: ref35/cit35
– ident: ref44/cit44
  doi: 10.1039/C4MD00216D
– ident: ref14/cit14
  doi: 10.1021/acs.jcim.7b00249
– ident: ref3/cit3
  doi: 10.1186/s13321-016-0148-0
– ident: ref6/cit6
  doi: 10.1021/acs.jmedchem.6b01611
– ident: ref34/cit34
– ident: ref28/cit28
  doi: 10.1021/ci034243x
– ident: ref40/cit40
– volume: 12
  start-page: 2825
  year: 2011
  ident: ref32/cit32
  publication-title: J. Mach. Learn. Res.
– ident: ref43/cit43
  doi: 10.1016/j.tiv.2017.02.016
– ident: ref42/cit42
  doi: 10.1021/ci400187y
– ident: ref39/cit39
– ident: ref11/cit11
  doi: 10.1021/jm000393c
– ident: ref4/cit4
  doi: 10.1021/ci400466r
– ident: ref26/cit26
  doi: 10.1093/nar/gkv1082
– ident: ref13/cit13
  doi: 10.1186/s13321-017-0235-x
– ident: ref19/cit19
  doi: 10.1093/nar/gkt1031
– ident: ref16/cit16
  doi: 10.1371/journal.pone.0141287
– ident: ref29/cit29
  doi: 10.1021/ci900161g
– ident: ref24/cit24
  doi: 10.1093/nar/gkw1099
– ident: ref33/cit33
  doi: 10.1145/2939672.2939785
– ident: ref17/cit17
  doi: 10.1021/ci3001277
– ident: ref15/cit15
– start-page: 45
  volume-title: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks
  year: 2010
  ident: ref23/cit23
– ident: ref41/cit41
– ident: ref18/cit18
  doi: 10.1093/nar/gkr777
– ident: ref22/cit22
  doi: 10.1021/acs.jcim.5b00543
– ident: ref2/cit2
  doi: 10.1186/1758-2946-5-26
– ident: ref25/cit25
  doi: 10.1021/jm9700575
– ident: ref30/cit30
SSID ssj0033962
Score 2.6728444
Snippet Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector...
SourceID proquest
pubmed
crossref
acs
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 27
SubjectTerms Algorithms
Artificial intelligence
Chemical compounds
Datasets as Topic
Machine learning
Models, Chemical
Molecular Structure
Molecules
Natural Language Processing
Protein Conformation
Proteins
Proteins - chemistry
Representations
Reproducibility of Results
Supervised learning
Unsupervised learning
Unsupervised Machine Learning
Vector spaces
Title Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition
URI http://dx.doi.org/10.1021/acs.jcim.7b00616
https://www.ncbi.nlm.nih.gov/pubmed/29268609
https://www.proquest.com/docview/2002988408
https://www.proquest.com/docview/1979966297
Volume 58
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwELYoHOgFCn2wQJGR4MAhi99e94ZW5SUtl7ISt8ixHQRdsojscuDXM_Ymi1paxDWJE9sznvnsmXyD0F5RMA2CtpkuSp4JpxwsKVJmXiqwfZaXZeIpGFyo06E4v5JXLzQ5f0fwGT20ru7eupu7ro4aQtUHtMQUrOEIg_q_WqvLuUnFQyPjWGakaUOS_3pDdESu_tMR_QddJi9zvDorV1QncsKYXPK7O50UXff0mrrxHQP4hFYasImPZtqxhhZCtY6W-22Nt8_oZDAescfgfuBhVU_vo92og8eDlGEZcEO-eo2PGuZxHI9tcUsygM_AYaWUry9oePzzsn-aNaUVMisEn8CuEfyQsNwGIX1JiWGeSEJKrYSWHjAL6dn4F20AAGikDtxQ5qURXlGrSeH5V7RYjauwgbCSjnpLwVhZKkourQJZGDAUulcQxUwH7cMM5M3SqPMU9WY0TxdhWvJmWjrosJVH7hp-8lgmY_RGi4N5i_sZN8cbz263In7pCksE9LDF7XXQ7vw2iCAGTWwVxlPobox7KhiH7qBvM9WYf4wZ0ERFzOY7h7iFPgLeismDGWPbaHHyMA3fAdNMip2kzM9bCO2m
linkProvider American Chemical Society
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6V9lAuQMujCwVcCQ4csvXba26rFe2WdnuArtRb5MQOgpZsRXY58OsZe5OtiqCiVyd2xp7xzDgz_gbgTVFwg4x2mSkqkclSl7ilaJV5pVH3OVFVCadgcqrHU_nxXJ2vAevuwiARDY7UpCD-NboA249t38qv3_smCgrT92ADfREehXo4-twpXyFsqiEagccyq2wXmfzbCNEelc1Ne_QPJzMZm4OH8GlFZsoxuegv5kW__PUHguOd5vEIHrSuJxkuZWUL1kK9DZujruLbYziczC75z1C-J9O6WVxFLdIETyYp3zKQFor1Cxm2OOQk_sQlHeQAOULzlRLAnsD04MPZaJy1hRYyJ6WY4xkSrZJ0wgWpfMWo5Z4qSiujpVEePRg6cPFObUB30CoThGXcKyu9Zs7QwounsF7P6rADRKuSecdQdTkmK6GcRpZYVBtmUFDNbQ_e4grk7UZp8hQD5yxPjbgsebssPdjv2JKXLVp5LJpxeUuPd6seV0ukjlve3e04fU0KT3D0eOAd9GBv9RhZEEMorg6zBZIbo6Aa52F68GwpIauPccv1QFP7_D-n-Bo2x2eTk_zk6PT4BdxHTyymFWac78L6_McivERvZ168SvL9G4qT9gc
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3db9MwED-NTQJeYAMGhcGMBA88pPO3a96qQtkHnZCgaG-REzsIGGlFWh721-_sJkUgmODViZ2z73x3zp1_B_CsKLhBRrvMFJXIZKlL3FK0yrzSqPucqKqEUzA51YdTeXymzjZAdXdhkIgGR2pSED_u6rmvWoQBdhDbv5Sfv_VNFBamr8FWjNpFwR6O3ncKWAib6ohG8LHMKttFJ_80QrRJZfOrTfqLo5kMzvg2fFyTmvJMvvaXi6JfXvyG4vjfc9mGW60LSoYrmdmBjVDfgRujrvLbXXgzmZ3zH6F8SaZ1s5xHbdIETyYp7zKQFpL1Exm2eOQk_swlHfQAOUIzlhLB7sF0_PrD6DBrCy5kTkqxwLMkWifphAtS-YpRyz1VlFZGS6M8ejJ04OLd2oBuoVUmCMu4V1Z6zZyhhRe7sFnP6vAAiFYl846hCnNMVkI5jWyxqD7MoKCa2x48xxXI2w3T5CkWzlmeGnFZ8nZZenDQsSYvW9TyWDzj_IoeL9Y95ivEjive3eu4_ZMUnmDp8eA76MHT9WNkQQyluDrMlkhujIZqnIfpwf2VlKw_xi3XA03tw3-c4j5cf_dqnL89Oj15BDfRIYvZhRnne7C5-L4Mj9HpWRRPkohfAqw_-Io
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mol2vec%3A+Unsupervised+Machine+Learning+Approach+with+Chemical+Intuition&rft.jtitle=Journal+of+chemical+information+and+modeling&rft.au=Jaeger%2C+Sabrina&rft.au=Fulle%2C+Simone&rft.au=Turk%2C+Samo&rft.date=2018-01-22&rft.pub=American+Chemical+Society&rft.issn=1549-9596&rft.eissn=1549-960X&rft.volume=58&rft.issue=1&rft.spage=27&rft_id=info:doi/10.1021%2Facs.jcim.7b00616&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-9596&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-9596&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-9596&client=summon