Deep learning for retention time prediction in reversed-phase liquid chromatography

•Several deep learning models were tested for accurate RT prediction for the large data set.•1D CNN was constructed to predict RTs with high accuracy based on METLIN SMRT data set.•RTs for five small RP-HPLC data sets were predicted using pre-trained 1D CNN models. Retention time prediction in high-...

Full description

Saved in:
Bibliographic Details
Published inJournal of Chromatography A Vol. 1664; p. 462792
Main Authors Fedorova, Elizaveta S., Matyushin, Dmitriy D., Plyushchenko, Ivan V., Stavrianidi, Andrey N., Buryak, Aleksey K.
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 08.02.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Several deep learning models were tested for accurate RT prediction for the large data set.•1D CNN was constructed to predict RTs with high accuracy based on METLIN SMRT data set.•RTs for five small RP-HPLC data sets were predicted using pre-trained 1D CNN models. Retention time prediction in high-performance liquid chromatography (HPLC) is the subject of many studies since it can improve the identification of unknown molecules in untargeted profiling using HPLC coupled with high-resolution mass spectrometry. Lots of approaches were developed for retention time prediction in liquid chromatography for a different number of molecules considering various molecular properties and machine learning algorithms. The recently built large retention time data set of standard compounds from the Metabolite and Chemical Entity Database (METLIN) allows researchers to create a model that can be used for retention time prediction of small molecules with wide varieties of structures and physicochemical properties. The ability to predict retention times using the largest data set was studied for different architectures of deep learning models that were trained on molecular fingerprints, and SMILES (string representation of a molecule) represented as one-hot matrices. The best result was achieved with a one-dimensional convolutional neural network (1D CNN) that uses SMILES as an input. The proposed model reached the mean absolute error and the median absolute error equal to 34.7 and 18.7 s, respectively, which outperformed the results previously obtained for this data set. The pre-trained 1D CNN on the METLIN SMRT data set was transferred on five other data sets to evaluate the generalization ability.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0021-9673
1873-3778
DOI:10.1016/j.chroma.2021.462792