Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, M...

Full description

Saved in:

Bibliographic Details
Published in	Journal of chemical information and modeling Vol. 58; no. 1; pp. 27 - 35
Main Authors	Jaeger, Sabrina, Fulle, Simone, Turk, Samo
Format	Journal Article
Language	English
Published	United States American Chemical Society 22.01.2018
Subjects	Algorithms Artificial intelligence Chemical compounds Datasets as Topic Machine learning Models, Chemical Molecular Structure Molecules Natural Language Processing Protein Conformation Proteins Proteins - chemistry Representations Reproducibility of Results Supervised learning Unsupervised learning Unsupervised Machine Learning Vector spaces
Online Access	Get full text
ISSN	1549-9596 1549-960X 1549-960X
DOI	10.1021/acs.jcim.7b00616

Cover

Loading…

Abstract	Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as a reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities.
AbstractList	Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as a reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities. Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as a reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities.Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Like the Word2vec models, where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that point in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing the vectors of the individual substructures and, for instance, be fed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pretrained once, yields dense vector representations, and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as a reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment-independent and thus can also be easily used for proteins with low sequence similarities.
Author	Turk, Samo Jaeger, Sabrina Fulle, Simone
Author_xml	– sequence: 1 givenname: Sabrina orcidid: 0000-0003-1144-7468 surname: Jaeger fullname: Jaeger, Sabrina – sequence: 2 givenname: Simone orcidid: 0000-0002-7646-5889 surname: Fulle fullname: Fulle, Simone email: fulle@bio.mx – sequence: 3 givenname: Samo orcidid: 0000-0003-2044-7670 surname: Turk fullname: Turk, Samo email: turk@bio.mx
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/29268609$$D View this record in MEDLINE/PubMed
BookMark	eNp9kc9PwjAUxxuDkR9692SWePEg-Np13Z43QhRJIF4k8daUrUjJ1mG7YfzvHQIXEj216ft8v_m-frukZUurCbmmMKDA6INK_WCdmmIQLwAEFWekQyOOfRTw3jreIxRt0vV-DRCGKNgFaTNkIhGAHTKelTnb6vQxmFtfb7TbGq-zYKbSlbE6mGrlrLEfwXCzcWXzGHyZahWMVrowqcqDia1qU5nSXpLzpcq9vjqcPTJ_fnobvfSnr-PJaDjtK87Dqs-BM-QqVJpH2ZICsgwigGUseBxliCEkirIw0hQZRrEOkbIsQp4JqmJYZGGP3O19mziftfaVLIxPdZ4rq8vaS4oxohAM4wa9PUHXZe1sk04yAIZJwiFpqJsDVS8KncmNM4Vy3_L4RQ0g9kDqSu-dXsrUVGq3c-WUySUFuetCNl3IXRfy0EUjhBPh0fsfyf1e8js5pv0T_wETipo_
CitedBy_id	crossref_primary_10_1021_jacs_1c12005 crossref_primary_10_1038_s41598_021_90259_7 crossref_primary_10_3390_ijms25147982 crossref_primary_10_1002_jcc_27469 crossref_primary_10_1016_j_fmre_2024_02_011 crossref_primary_10_1093_bib_bbac338 crossref_primary_10_18632_aging_203887 crossref_primary_10_1016_j_fuel_2024_133462 crossref_primary_10_1021_jacs_2c08997 crossref_primary_10_1063_5_0014828 crossref_primary_10_3389_fphar_2021_772296 crossref_primary_10_1016_j_mencom_2021_11_003 crossref_primary_10_1093_nar_gkaa895 crossref_primary_10_1016_j_future_2024_07_033 crossref_primary_10_1021_acs_jcim_9b00749 crossref_primary_10_1093_bib_bbab365 crossref_primary_10_1016_j_drudis_2020_01_020 crossref_primary_10_1038_s41597_023_02612_2 crossref_primary_10_1016_j_csl_2020_101104 crossref_primary_10_1038_s41598_019_38746_w crossref_primary_10_1080_10643389_2025_2469868 crossref_primary_10_1093_bib_bbab239 crossref_primary_10_3389_fchem_2023_1292027 crossref_primary_10_54097_hset_v45i_7582 crossref_primary_10_1039_D1SC02783B crossref_primary_10_1186_s12859_024_05847_x crossref_primary_10_1016_j_coche_2022_100840 crossref_primary_10_1111_cbdd_14092 crossref_primary_10_1073_pnas_1803294115 crossref_primary_10_1007_s10118_024_3237_y crossref_primary_10_1155_2021_7181815 crossref_primary_10_1021_acscentsci_4c01991 crossref_primary_10_1093_bioinformatics_btac837 crossref_primary_10_1021_acs_jcim_2c00060 crossref_primary_10_1186_s13321_021_00574_4 crossref_primary_10_1016_j_jmgm_2022_108283 crossref_primary_10_1007_s00894_022_05373_8 crossref_primary_10_1021_acs_jcim_4c00318 crossref_primary_10_1016_j_arr_2024_102276 crossref_primary_10_3847_1538_4357_ad004c crossref_primary_10_1002_cmdc_202100418 crossref_primary_10_1016_j_eswa_2025_126637 crossref_primary_10_1007_s42979_021_00948_3 crossref_primary_10_1016_j_procs_2023_10_390 crossref_primary_10_1088_2058_9565_ac6825 crossref_primary_10_1016_j_compbiolchem_2024_108320 crossref_primary_10_34133_hds_0098 crossref_primary_10_1039_D2TA07660H crossref_primary_10_1039_D2DD00107A crossref_primary_10_3389_fphar_2022_864412 crossref_primary_10_1021_acsomega_3c01218 crossref_primary_10_1016_j_genrep_2020_100869 crossref_primary_10_1021_acs_jcim_3c02083 crossref_primary_10_1002_adfm_202315177 crossref_primary_10_1039_D1EA00090J crossref_primary_10_1002_smtd_202300214 crossref_primary_10_1007_s10822_023_00533_1 crossref_primary_10_1109_TCBB_2021_3084397 crossref_primary_10_3847_2041_8213_acb648 crossref_primary_10_1021_acs_jcim_4c00422 crossref_primary_10_1109_TCBB_2021_3069040 crossref_primary_10_1093_bib_bbab449 crossref_primary_10_1016_j_compbiolchem_2024_108137 crossref_primary_10_1186_s13040_024_00419_4 crossref_primary_10_1021_acs_jcim_3c02070 crossref_primary_10_1021_acsmedchemlett_1c00439 crossref_primary_10_1039_D1CC07035E crossref_primary_10_1038_s41598_024_51940_9 crossref_primary_10_1016_j_bpc_2022_106891 crossref_primary_10_1016_j_jbi_2020_103579 crossref_primary_10_1016_j_jmb_2025_168983 crossref_primary_10_1021_acs_jcim_0c01097 crossref_primary_10_1021_acs_jcim_4c00310 crossref_primary_10_1016_j_chroma_2021_462119 crossref_primary_10_1093_bib_bbab317 crossref_primary_10_1021_acs_jctc_4c00961 crossref_primary_10_1063_5_0205433 crossref_primary_10_1080_17460441_2018_1547278 crossref_primary_10_1002_aic_18068 crossref_primary_10_1016_j_medj_2024_07_026 crossref_primary_10_1021_acsomega_4c06113 crossref_primary_10_1021_acsomega_4c07689 crossref_primary_10_2131_jts_49_249 crossref_primary_10_1039_D4NP00009A crossref_primary_10_1002_aic_18185 crossref_primary_10_3390_cells13090771 crossref_primary_10_1016_j_ymeth_2024_01_017 crossref_primary_10_1016_j_cbi_2021_109766 crossref_primary_10_1093_bib_bbab593 crossref_primary_10_1039_D3DD00020F crossref_primary_10_1016_j_compbiomed_2020_104197 crossref_primary_10_1093_bib_bbab109 crossref_primary_10_1109_TAI_2023_3254518 crossref_primary_10_1016_j_neunet_2024_107088 crossref_primary_10_3390_electronics10101143 crossref_primary_10_1016_j_tchem_2023_100035 crossref_primary_10_1021_acs_jcim_9b00721 crossref_primary_10_1016_j_cjche_2020_10_044 crossref_primary_10_1093_bib_bbab586 crossref_primary_10_1093_bib_bbad400 crossref_primary_10_1021_acs_jcim_4c01862 crossref_primary_10_1021_jacs_9b11569 crossref_primary_10_1080_07391102_2021_1905559 crossref_primary_10_3390_biom13091372 crossref_primary_10_1016_j_simpa_2024_100623 crossref_primary_10_1016_j_fluid_2020_112829 crossref_primary_10_1016_j_knosys_2022_109028 crossref_primary_10_1093_bioinformatics_btac538 crossref_primary_10_1021_acsenergylett_2c01535 crossref_primary_10_1002_prep_202200264 crossref_primary_10_1093_bioinformatics_btz411 crossref_primary_10_1021_acscombsci_0c00169 crossref_primary_10_1016_j_jmgm_2022_108344 crossref_primary_10_1021_acsomega_4c06163 crossref_primary_10_1038_s41598_020_73105_0 crossref_primary_10_1002_anie_202101986 crossref_primary_10_1007_s11030_024_10905_w crossref_primary_10_1039_D0CP03596C crossref_primary_10_1371_journal_pone_0300906 crossref_primary_10_3389_fphar_2024_1441587 crossref_primary_10_1093_bioinformatics_btad519 crossref_primary_10_1038_s41467_023_37572_z crossref_primary_10_1016_j_compbiomed_2024_108037 crossref_primary_10_3390_ijms252111385 crossref_primary_10_3847_2041_8213_ac194b crossref_primary_10_1021_acs_jcim_1c00584 crossref_primary_10_1093_bib_bbab514 crossref_primary_10_1016_j_ijbiomac_2024_136678 crossref_primary_10_3389_fgene_2022_859188 crossref_primary_10_1021_acs_jcim_8b00803 crossref_primary_10_1021_acs_jcim_8b00801 crossref_primary_10_1016_j_cbi_2025_111372 crossref_primary_10_3390_molecules27123931 crossref_primary_10_1021_acs_jcim_3c01286 crossref_primary_10_1016_j_fluid_2022_113531 crossref_primary_10_1039_D3NR04944B crossref_primary_10_1093_bioinformatics_btac550 crossref_primary_10_1021_acs_jcim_0c01366 crossref_primary_10_3389_fchem_2023_1239467 crossref_primary_10_1093_bib_bbaa218 crossref_primary_10_1021_acs_jcim_2c00229 crossref_primary_10_1002_wcms_1603 crossref_primary_10_1021_acs_jcim_1c01341 crossref_primary_10_1038_s41598_024_61124_0 crossref_primary_10_1186_s12859_024_05915_2 crossref_primary_10_1039_D4VA00072B crossref_primary_10_1021_acs_est_4c11282 crossref_primary_10_1186_s13321_020_00430_x crossref_primary_10_1021_acs_jpca_2c08821 crossref_primary_10_1093_bioinformatics_btz307 crossref_primary_10_1021_acs_jcim_4c00056 crossref_primary_10_1002_aisy_202300798 crossref_primary_10_1016_j_compbiomed_2023_107911 crossref_primary_10_1002_jcc_26786 crossref_primary_10_1557_s43578_022_00628_9 crossref_primary_10_1021_jacs_9b05895 crossref_primary_10_1002_ange_202008366 crossref_primary_10_1021_acs_jpca_4c03580 crossref_primary_10_3390_ijms22189983 crossref_primary_10_1039_D3CC01570J crossref_primary_10_1186_s12859_023_05369_y crossref_primary_10_1016_j_drudis_2020_03_003 crossref_primary_10_1038_s41540_022_00226_9 crossref_primary_10_1093_bfgp_elac004 crossref_primary_10_1016_j_bbadis_2024_167263 crossref_primary_10_1016_j_drudis_2022_05_005 crossref_primary_10_1016_j_drudis_2022_103351 crossref_primary_10_1007_s11030_024_10839_3 crossref_primary_10_1021_acs_jcim_4c00157 crossref_primary_10_1016_j_chemosphere_2022_136447 crossref_primary_10_1021_acs_molpharmaceut_4c00086 crossref_primary_10_3389_fmolb_2022_963912 crossref_primary_10_3389_fmolb_2022_872086 crossref_primary_10_1016_j_ygeno_2020_11_009 crossref_primary_10_3389_fgene_2021_738274 crossref_primary_10_1016_j_physrep_2021_08_002 crossref_primary_10_1186_s13321_019_0328_9 crossref_primary_10_1021_acsomega_7b02045 crossref_primary_10_1186_s13321_024_00806_3 crossref_primary_10_3390_molecules25153446 crossref_primary_10_1007_s12539_020_00376_6 crossref_primary_10_1016_j_ymeth_2024_08_003 crossref_primary_10_1016_j_fluid_2023_113734 crossref_primary_10_1016_j_jmgm_2023_108564 crossref_primary_10_1246_bcsj_20200220 crossref_primary_10_1093_bib_bbab503 crossref_primary_10_1016_j_ijms_2022_116817 crossref_primary_10_1021_acsomega_4c07078 crossref_primary_10_1088_1361_648X_ac3e1e crossref_primary_10_3389_fmed_2022_916481 crossref_primary_10_1021_acs_jmedchem_3c01893 crossref_primary_10_1021_acsomega_0c03866 crossref_primary_10_1186_s13321_020_00473_0 crossref_primary_10_1016_j_aichem_2024_100064 crossref_primary_10_1002_minf_202000212 crossref_primary_10_1186_s13321_022_00600_z crossref_primary_10_3389_frai_2021_757780 crossref_primary_10_1016_j_jmgm_2024_108851 crossref_primary_10_1186_s13321_024_00916_y crossref_primary_10_1109_ACCESS_2024_3368926 crossref_primary_10_3389_fphar_2021_827606 crossref_primary_10_1021_acs_chemrestox_0c00374 crossref_primary_10_3390_biom13030503 crossref_primary_10_1109_ACCESS_2024_3485553 crossref_primary_10_1093_bioinformatics_btad462 crossref_primary_10_1093_bioinformatics_btae558 crossref_primary_10_1093_bioinformatics_bty287 crossref_primary_10_1021_acs_jcim_2c00841 crossref_primary_10_1016_j_neunet_2024_106779 crossref_primary_10_3390_su162310681 crossref_primary_10_1002_adts_202100565 crossref_primary_10_1186_s13068_023_02419_8 crossref_primary_10_1002_anie_202008366 crossref_primary_10_1021_acs_jctc_2c01039 crossref_primary_10_1016_j_drudis_2021_06_009 crossref_primary_10_1021_acs_jpca_1c06152 crossref_primary_10_1093_bioinformatics_btae563 crossref_primary_10_1038_s41598_024_59933_4 crossref_primary_10_1038_s41570_021_00260_x crossref_primary_10_7717_peerj_8864 crossref_primary_10_1016_j_trechm_2022_07_005 crossref_primary_10_1021_acs_jcim_1c01031 crossref_primary_10_1016_j_jhazmat_2024_133443 crossref_primary_10_1039_D2RE00030J crossref_primary_10_1093_bioinformatics_btaa094 crossref_primary_10_1038_s41598_022_07608_3 crossref_primary_10_1016_j_comtox_2023_100298 crossref_primary_10_1021_acs_jcim_8b00769 crossref_primary_10_1016_j_ymeth_2020_05_014 crossref_primary_10_1515_jib_2022_0050 crossref_primary_10_1016_j_knosys_2023_111329 crossref_primary_10_1002_minf_201900131 crossref_primary_10_1016_j_chemolab_2024_105168 crossref_primary_10_1016_j_ces_2024_120111 crossref_primary_10_1016_j_cplett_2018_05_035 crossref_primary_10_3389_fphar_2022_892559 crossref_primary_10_1007_s12539_024_00632_z crossref_primary_10_1186_s13321_023_00716_w crossref_primary_10_1016_j_compbiolchem_2023_107982 crossref_primary_10_3390_ph14080758 crossref_primary_10_1016_j_eswa_2023_121016 crossref_primary_10_1063_5_0201522 crossref_primary_10_1021_acs_chemmater_3c02203 crossref_primary_10_1021_acs_jcim_8b00671 crossref_primary_10_3390_molecules26154678 crossref_primary_10_1039_D1CP04422B crossref_primary_10_2174_0109298673266470231023110841 crossref_primary_10_1186_s13321_022_00650_3 crossref_primary_10_1021_acs_jcim_1c00086 crossref_primary_10_1093_bioinformatics_btae594 crossref_primary_10_1002_adma_202106506 crossref_primary_10_1007_s44196_024_00561_1 crossref_primary_10_1021_acs_jcim_3c00572 crossref_primary_10_1021_acsnano_4c12350 crossref_primary_10_1021_acs_jcim_4c02161 crossref_primary_10_1093_bib_bbae281 crossref_primary_10_1021_acs_jcim_2c00765 crossref_primary_10_1039_D1RA07956E crossref_primary_10_1002_chem_202401626 crossref_primary_10_1002_minf_202100156 crossref_primary_10_1039_D1NP00016K crossref_primary_10_1039_D2CP03423A crossref_primary_10_1016_j_ece_2023_08_003 crossref_primary_10_1021_acs_jcim_0c01413 crossref_primary_10_1016_j_chempr_2024_07_025 crossref_primary_10_1038_s42256_022_00463_x crossref_primary_10_3390_ijgi8030134 crossref_primary_10_3389_fbioe_2022_1005051 crossref_primary_10_1002_solr_202301079 crossref_primary_10_1007_s11224_022_01960_w crossref_primary_10_1109_TCBB_2024_3434340 crossref_primary_10_3390_ijms23095258 crossref_primary_10_1016_j_molliq_2023_123708 crossref_primary_10_1021_acscentsci_3c01638 crossref_primary_10_1039_D3DD00119A crossref_primary_10_1177_1535370221993422 crossref_primary_10_1016_j_jff_2023_105542 crossref_primary_10_1021_acs_jcim_3c00554 crossref_primary_10_1021_acs_jcim_3c01524 crossref_primary_10_1093_bioinformatics_btad169 crossref_primary_10_1016_j_pmatsci_2022_101043 crossref_primary_10_1016_j_ces_2024_121128 crossref_primary_10_3389_fchem_2019_00895 crossref_primary_10_1021_acsami_2c08891 crossref_primary_10_1038_s41598_023_50393_w crossref_primary_10_1007_s10489_022_04280_y crossref_primary_10_1093_bib_bbae298 crossref_primary_10_1016_j_ccst_2025_100374 crossref_primary_10_1016_j_matdes_2022_110735 crossref_primary_10_1186_s12859_024_05698_6 crossref_primary_10_1016_j_patter_2023_100846 crossref_primary_10_1039_D1CP02903G crossref_primary_10_1002_wcms_1597 crossref_primary_10_1021_acs_jcim_2c00798 crossref_primary_10_1098_rsif_2017_0387 crossref_primary_10_1038_s41524_024_01261_2 crossref_primary_10_1016_j_aca_2021_338403 crossref_primary_10_1016_j_mtcomm_2023_107577 crossref_primary_10_1080_1062936X_2024_2440903 crossref_primary_10_1109_JBHI_2023_3315073 crossref_primary_10_1021_acs_jcim_2c00671 crossref_primary_10_1038_s41551_021_00819_5 crossref_primary_10_1038_s42256_022_00457_9 crossref_primary_10_1021_acs_jcim_9b00358 crossref_primary_10_1007_s41060_022_00371_8 crossref_primary_10_1371_journal_pone_0282042 crossref_primary_10_1063_5_0131067 crossref_primary_10_1021_acsomega_3c03471 crossref_primary_10_3390_pr11051340 crossref_primary_10_1002_aic_17748 crossref_primary_10_1016_j_compbiolchem_2024_108056 crossref_primary_10_1039_D4TC04046E crossref_primary_10_1016_j_csbj_2023_08_016 crossref_primary_10_3389_fbinf_2023_1225149 crossref_primary_10_1016_j_compbiomed_2022_106192 crossref_primary_10_1038_s41524_023_01154_w crossref_primary_10_1007_s12257_020_0049_y crossref_primary_10_1021_acsomega_2c05693 crossref_primary_10_1021_acsptsci_2c00193 crossref_primary_10_1021_jacs_2c11098 crossref_primary_10_1038_s42256_021_00368_1 crossref_primary_10_1021_acs_jcim_4c00512 crossref_primary_10_1039_D4DD00116H crossref_primary_10_1016_j_ymeth_2024_11_009 crossref_primary_10_3390_molecules28124691 crossref_primary_10_1021_acs_jcim_4c00747 crossref_primary_10_1016_j_msec_2021_112553 crossref_primary_10_1016_j_ymeth_2022_07_009 crossref_primary_10_1039_D4CS00196F crossref_primary_10_1007_s00894_023_05492_w crossref_primary_10_1073_pnas_2220778120 crossref_primary_10_1016_j_tips_2023_04_002 crossref_primary_10_1021_acsomega_3c01078 crossref_primary_10_1039_C9SC02452B crossref_primary_10_1021_acs_jcim_8b00286 crossref_primary_10_1016_j_compbiolchem_2022_107730 crossref_primary_10_1016_j_plantsci_2020_110527 crossref_primary_10_1038_s41598_024_75487_x crossref_primary_10_1007_s12293_024_00414_6 crossref_primary_10_1088_2058_9565_adb3c7 crossref_primary_10_1016_j_checat_2024_101079 crossref_primary_10_1515_jib_2022_0006 crossref_primary_10_1016_j_isci_2022_104585 crossref_primary_10_1016_j_molliq_2020_114571 crossref_primary_10_1016_j_ymeth_2019_03_012 crossref_primary_10_1038_s42256_024_00821_x crossref_primary_10_1016_j_eswa_2024_125403 crossref_primary_10_1016_j_ijbiomac_2024_133825 crossref_primary_10_1093_mutage_geac010 crossref_primary_10_1016_j_mtphys_2022_100850 crossref_primary_10_1111_cbdd_13742 crossref_primary_10_1016_j_procs_2024_09_243 crossref_primary_10_1186_s13321_019_0368_1 crossref_primary_10_1021_acs_jcim_4c00726 crossref_primary_10_1016_j_aichem_2023_100002 crossref_primary_10_1039_C9RA09211K crossref_primary_10_1021_acs_jcim_1c00920 crossref_primary_10_1186_s13321_021_00533_z crossref_primary_10_1021_acs_iecr_2c01473 crossref_primary_10_1021_acs_jcim_9b00798 crossref_primary_10_1039_D1SC05610G crossref_primary_10_1093_bib_bbad115 crossref_primary_10_1016_j_compbiolchem_2022_107674 crossref_primary_10_1093_bib_bbad235 crossref_primary_10_1002_minf_202100315 crossref_primary_10_1021_acs_jcim_4c01904 crossref_primary_10_1186_s13321_023_00713_z crossref_primary_10_1186_s13321_019_0355_6 crossref_primary_10_1039_D1TA10589B crossref_primary_10_1021_acs_jcim_0c00726 crossref_primary_10_1186_s13321_021_00540_0 crossref_primary_10_1021_acs_jcim_0c00601 crossref_primary_10_1186_s13321_022_00608_5 crossref_primary_10_2139_ssrn_4017219 crossref_primary_10_2174_0115748936285690240101041704 crossref_primary_10_1016_j_chemolab_2021_104309 crossref_primary_10_1016_j_compbiomed_2023_107136 crossref_primary_10_1038_s41598_023_42952_y crossref_primary_10_1016_j_compbiomed_2023_107131 crossref_primary_10_1007_s10822_024_00559_z crossref_primary_10_1002_minf_201900170 crossref_primary_10_1038_s41598_022_23014_1 crossref_primary_10_1109_TEVC_2021_3064943 crossref_primary_10_1145_3465398 crossref_primary_10_1021_acs_jcim_1c00975 crossref_primary_10_2174_0118715206270568231129054853 crossref_primary_10_1016_j_ddtec_2020_08_004 crossref_primary_10_1016_j_ddtec_2020_08_003 crossref_primary_10_1021_acs_jcim_0c00622 crossref_primary_10_1016_j_compbiolchem_2020_107286 crossref_primary_10_1016_j_ijpharm_2024_123884 crossref_primary_10_1038_s42256_020_0209_y crossref_primary_10_1109_TVCG_2020_3030438 crossref_primary_10_1002_celc_202300738 crossref_primary_10_1021_acsomega_1c06453 crossref_primary_10_1002_ange_202101986 crossref_primary_10_1021_acs_jpcc_1c05715
Cites_doi	10.1021/acs.jcim.6b00601 10.3389/fenvs.2015.00080 10.1101/086033 10.1021/ci100050t 10.1007/s10822-016-9938-8 10.1109/ICCV.2011.6126527 10.1002/cmdc.201700180 10.1021/ci990307l 10.1039/C4MD00216D 10.1021/acs.jcim.7b00249 10.1186/s13321-016-0148-0 10.1021/acs.jmedchem.6b01611 10.1021/ci034243x 10.1016/j.tiv.2017.02.016 10.1021/ci400187y 10.1021/jm000393c 10.1021/ci400466r 10.1093/nar/gkv1082 10.1186/s13321-017-0235-x 10.1093/nar/gkt1031 10.1371/journal.pone.0141287 10.1021/ci900161g 10.1093/nar/gkw1099 10.1145/2939672.2939785 10.1021/ci3001277 10.1093/nar/gkr777 10.1021/acs.jcim.5b00543 10.1186/1758-2946-5-26 10.1021/jm9700575
ContentType	Journal Article
Copyright	Copyright © 2017 American Chemical Society Copyright American Chemical Society Jan 22, 2018
Copyright_xml	– notice: Copyright © 2017 American Chemical Society – notice: Copyright American Chemical Society Jan 22, 2018
DBID	AAYXX CITATION CGR CUY CVF ECM EIF NPM 7SC 7SR 7U5 8BQ 8FD JG9 JQ2 L7M L~C L~D 7X8
DOI	10.1021/acs.jcim.7b00616
DatabaseName	CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Computer and Information Systems Abstracts Engineered Materials Abstracts Solid State and Superconductivity Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest Computer Science Collection Computer and Information Systems Abstracts Solid State and Superconductivity Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitleList	MEDLINE Materials Research Database MEDLINE - Academic
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Chemistry
EISSN	1549-960X
EndPage	35
ExternalDocumentID	29268609 10_1021_acs_jcim_7b00616 a350265587
Genre	Journal Article Feature
GroupedDBID	- 55A 5GY 7~N AABXI ABFLS ABMVS ABUCX ACGFS ACIWK ACNCT ACS AEESW AENEX AFEFF ALMA_UNASSIGNED_HOLDINGS AQSVZ D0L DU5 EBS ED ED~ EJD F5P GNL IH9 JG JG~ P2P PQEST PQQKQ RNS ROL UI2 VF5 VG9 W1F X --- -~X 4.4 5VS AAYXX ABBLG ABJNI ABLBI ABQRX ADHLV AHGAQ CITATION CUPRZ GGK CGR CUY CVF ECM EIF NPM 7SC 7SR 7U5 8BQ 8FD JG9 JQ2 L7M L~C L~D 7X8
ID	FETCH-LOGICAL-a443t-404294a3ae45df1092d0500f76475d99308a1235e192957e3912d594d61a70bd3
IEDL.DBID	ACS
ISSN	1549-9596 1549-960X
IngestDate	Fri Jul 11 00:30:34 EDT 2025 Mon Jun 30 10:53:36 EDT 2025 Mon Jul 21 06:06:30 EDT 2025 Thu Apr 24 23:11:24 EDT 2025 Tue Jul 01 03:04:34 EDT 2025 Thu Aug 27 13:42:35 EDT 2020
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	1
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a443t-404294a3ae45df1092d0500f76475d99308a1235e192957e3912d594d61a70bd3
Notes	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ORCID	0000-0002-7646-5889 0000-0003-2044-7670 0000-0003-1144-7468
OpenAccessLink	https://figshare.com/articles/journal_contribution/Mol2vec_Unsupervised_Machine_Learning_Approach_with_Chemical_Intuition/5773974
PMID	29268609
PQID	2002988408
PQPubID	28739
PageCount	9
ParticipantIDs	proquest_miscellaneous_1979966297 proquest_journals_2002988408 pubmed_primary_29268609 crossref_citationtrail_10_1021_acs_jcim_7b00616 crossref_primary_10_1021_acs_jcim_7b00616 acs_journals_10_1021_acs_jcim_7b00616
ProviderPackageCode	JG~ 55A AABXI GNL VF5 7~N VG9 W1F ACS AEESW AFEFF ABMVS ABUCX IH9 AQSVZ ED~ UI2 CITATION AAYXX
PublicationCentury	2000
PublicationDate	2018-01-22
PublicationDateYYYYMMDD	2018-01-22
PublicationDate_xml	– month: 01 year: 2018 text: 2018-01-22 day: 22
PublicationDecade	2010
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: Washington
PublicationTitle	Journal of chemical information and modeling
PublicationTitleAlternate	J. Chem. Inf. Model
PublicationYear	2018
Publisher	American Chemical Society
Publisher_xml	– name: American Chemical Society
References	ref9/cit9 ref6/cit6 ref3/cit3 ref27/cit27 ref18/cit18 Srivastava N. (ref38/cit38) 2014; 15 ref11/cit11 ref25/cit25 ref16/cit16 ref29/cit29 ref39/cit39 ref14/cit14 ref8/cit8 ref5/cit5 ref31/cit31 ref2/cit2 ref43/cit43 ref34/cit34 ref37/cit37 ref28/cit28 ref40/cit40 ref20/cit20 ref17/cit17 ref10/cit10 Řehuřek R. (ref23/cit23) 2010 ref26/cit26 ref35/cit35 ref19/cit19 ref21/cit21 ref12/cit12 Nair V. (ref36/cit36) 2010 ref15/cit15 Pedregosa F. (ref32/cit32) 2011; 12 ref42/cit42 ref41/cit41 ref22/cit22 ref13/cit13 ref33/cit33 ref4/cit4 ref30/cit30 ref1/cit1 ref24/cit24 ref44/cit44 ref7/cit7
References_xml	– ident: ref9/cit9 doi: 10.1021/acs.jcim.6b00601 – ident: ref20/cit20 – ident: ref5/cit5 doi: 10.3389/fenvs.2015.00080 – ident: ref12/cit12 doi: 10.1101/086033 – ident: ref31/cit31 – ident: ref37/cit37 – ident: ref1/cit1 doi: 10.1021/ci100050t – start-page: 807 volume-title: Proceedings of the 27th International Conference on Machine Learning (ICML-10) year: 2010 ident: ref36/cit36 – ident: ref8/cit8 doi: 10.1007/s10822-016-9938-8 – volume: 15 start-page: 1929 year: 2014 ident: ref38/cit38 publication-title: J. Mach. Learn. Res. – ident: ref27/cit27 doi: 10.1109/ICCV.2011.6126527 – ident: ref7/cit7 doi: 10.1002/cmdc.201700180 – ident: ref10/cit10 – ident: ref21/cit21 doi: 10.1021/ci990307l – ident: ref35/cit35 – ident: ref44/cit44 doi: 10.1039/C4MD00216D – ident: ref14/cit14 doi: 10.1021/acs.jcim.7b00249 – ident: ref3/cit3 doi: 10.1186/s13321-016-0148-0 – ident: ref6/cit6 doi: 10.1021/acs.jmedchem.6b01611 – ident: ref34/cit34 – ident: ref28/cit28 doi: 10.1021/ci034243x – ident: ref40/cit40 – volume: 12 start-page: 2825 year: 2011 ident: ref32/cit32 publication-title: J. Mach. Learn. Res. – ident: ref43/cit43 doi: 10.1016/j.tiv.2017.02.016 – ident: ref42/cit42 doi: 10.1021/ci400187y – ident: ref39/cit39 – ident: ref11/cit11 doi: 10.1021/jm000393c – ident: ref4/cit4 doi: 10.1021/ci400466r – ident: ref26/cit26 doi: 10.1093/nar/gkv1082 – ident: ref13/cit13 doi: 10.1186/s13321-017-0235-x – ident: ref19/cit19 doi: 10.1093/nar/gkt1031 – ident: ref16/cit16 doi: 10.1371/journal.pone.0141287 – ident: ref29/cit29 doi: 10.1021/ci900161g – ident: ref24/cit24 doi: 10.1093/nar/gkw1099 – ident: ref33/cit33 doi: 10.1145/2939672.2939785 – ident: ref17/cit17 doi: 10.1021/ci3001277 – ident: ref15/cit15 – start-page: 45 volume-title: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks year: 2010 ident: ref23/cit23 – ident: ref41/cit41 – ident: ref18/cit18 doi: 10.1093/nar/gkr777 – ident: ref22/cit22 doi: 10.1021/acs.jcim.5b00543 – ident: ref2/cit2 doi: 10.1186/1758-2946-5-26 – ident: ref25/cit25 doi: 10.1021/jm9700575 – ident: ref30/cit30
SSID	ssj0033962
Score	2.6728444
Snippet	Inspired by natural language processing techniques, we here introduce Mol2vec, which is an unsupervised machine learning approach to learn vector...
SourceID	proquest pubmed crossref acs
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	27
SubjectTerms	Algorithms Artificial intelligence Chemical compounds Datasets as Topic Machine learning Models, Chemical Molecular Structure Molecules Natural Language Processing Protein Conformation Proteins Proteins - chemistry Representations Reproducibility of Results Supervised learning Unsupervised learning Unsupervised Machine Learning Vector spaces
Title	Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition
URI	http://dx.doi.org/10.1021/acs.jcim.7b00616 https://www.ncbi.nlm.nih.gov/pubmed/29268609 https://www.proquest.com/docview/2002988408 https://www.proquest.com/docview/1979966297
Volume	58
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwELYoHOgFCn2wQJGR4MAhi99e94ZW5SUtl7ISt8ixHQRdsojscuDXM_Ymi1paxDWJE9sznvnsmXyD0F5RMA2CtpkuSp4JpxwsKVJmXiqwfZaXZeIpGFyo06E4v5JXLzQ5f0fwGT20ru7eupu7ro4aQtUHtMQUrOEIg_q_WqvLuUnFQyPjWGakaUOS_3pDdESu_tMR_QddJi9zvDorV1QncsKYXPK7O50UXff0mrrxHQP4hFYasImPZtqxhhZCtY6W-22Nt8_oZDAescfgfuBhVU_vo92og8eDlGEZcEO-eo2PGuZxHI9tcUsygM_AYaWUry9oePzzsn-aNaUVMisEn8CuEfyQsNwGIX1JiWGeSEJKrYSWHjAL6dn4F20AAGikDtxQ5qURXlGrSeH5V7RYjauwgbCSjnpLwVhZKkourQJZGDAUulcQxUwH7cMM5M3SqPMU9WY0TxdhWvJmWjrosJVH7hp-8lgmY_RGi4N5i_sZN8cbz263In7pCksE9LDF7XXQ7vw2iCAGTWwVxlPobox7KhiH7qBvM9WYf4wZ0ERFzOY7h7iFPgLeismDGWPbaHHyMA3fAdNMip2kzM9bCO2m
linkProvider	American Chemical Society
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6V9lAuQMujCwVcCQ4csvXba26rFe2WdnuArtRb5MQOgpZsRXY58OsZe5OtiqCiVyd2xp7xzDgz_gbgTVFwg4x2mSkqkclSl7ilaJV5pVH3OVFVCadgcqrHU_nxXJ2vAevuwiARDY7UpCD-NboA249t38qv3_smCgrT92ADfREehXo4-twpXyFsqiEagccyq2wXmfzbCNEelc1Ne_QPJzMZm4OH8GlFZsoxuegv5kW__PUHguOd5vEIHrSuJxkuZWUL1kK9DZujruLbYziczC75z1C-J9O6WVxFLdIETyYp3zKQFor1Cxm2OOQk_sQlHeQAOULzlRLAnsD04MPZaJy1hRYyJ6WY4xkSrZJ0wgWpfMWo5Z4qSiujpVEePRg6cPFObUB30CoThGXcKyu9Zs7QwounsF7P6rADRKuSecdQdTkmK6GcRpZYVBtmUFDNbQ_e4grk7UZp8hQD5yxPjbgsebssPdjv2JKXLVp5LJpxeUuPd6seV0ukjlve3e04fU0KT3D0eOAd9GBv9RhZEEMorg6zBZIbo6Aa52F68GwpIauPccv1QFP7_D-n-Bo2x2eTk_zk6PT4BdxHTyymFWac78L6_McivERvZ168SvL9G4qT9gc
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3db9MwED-NTQJeYAMGhcGMBA88pPO3a96qQtkHnZCgaG-REzsIGGlFWh721-_sJkUgmODViZ2z73x3zp1_B_CsKLhBRrvMFJXIZKlL3FK0yrzSqPucqKqEUzA51YdTeXymzjZAdXdhkIgGR2pSED_u6rmvWoQBdhDbv5Sfv_VNFBamr8FWjNpFwR6O3ncKWAib6ohG8LHMKttFJ_80QrRJZfOrTfqLo5kMzvg2fFyTmvJMvvaXi6JfXvyG4vjfc9mGW60LSoYrmdmBjVDfgRujrvLbXXgzmZ3zH6F8SaZ1s5xHbdIETyYp7zKQFpL1Exm2eOQk_swlHfQAOUIzlhLB7sF0_PrD6DBrCy5kTkqxwLMkWifphAtS-YpRyz1VlFZGS6M8ejJ04OLd2oBuoVUmCMu4V1Z6zZyhhRe7sFnP6vAAiFYl846hCnNMVkI5jWyxqD7MoKCa2x48xxXI2w3T5CkWzlmeGnFZ8nZZenDQsSYvW9TyWDzj_IoeL9Y95ivEjive3eu4_ZMUnmDp8eA76MHT9WNkQQyluDrMlkhujIZqnIfpwf2VlKw_xi3XA03tw3-c4j5cf_dqnL89Oj15BDfRIYvZhRnne7C5-L4Mj9HpWRRPkohfAqw_-Io
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mol2vec%3A+Unsupervised+Machine+Learning+Approach+with+Chemical+Intuition&rft.jtitle=Journal+of+chemical+information+and+modeling&rft.au=Jaeger%2C+Sabrina&rft.au=Fulle%2C+Simone&rft.au=Turk%2C+Samo&rft.date=2018-01-22&rft.pub=American+Chemical+Society&rft.issn=1549-9596&rft.eissn=1549-960X&rft.volume=58&rft.issue=1&rft.spage=27&rft_id=info:doi/10.1021%2Facs.jcim.7b00616&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1549-9596&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1549-9596&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1549-9596&client=summon