Detecting Biosignatures in Complex Molecular Mixtures From Pyrolysis‐Gas Chromatography‐Mass Spectrometry Data Using Machine Learning
Understanding how measured molecular signals can distinguish the chemistry of life from the chemistry of the nonliving world is a central focus of astrobiology and paleobiology. We train and compare several machine learning (ML) classification models on data from pyrolysis‐gas chromatography‐mass sp...
Saved in:
Published in | Journal of geophysical research. Machine learning and computation Vol. 2; no. 3 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
01.09.2025
|
Online Access | Get full text |
Cover
Loading…
Summary: | Understanding how measured molecular signals can distinguish the chemistry of life from the chemistry of the nonliving world is a central focus of astrobiology and paleobiology. We train and compare several machine learning (ML) classification models on data from pyrolysis‐gas chromatography‐mass spectrometry (py‐GC‐MS)—a widely available analytical method that has been employed in space missions. We analyzed various organic carbon‐bearing geomaterials to consider relationships among suites of molecules that can help identify their biogenicity and potentially be used to analyze data from various solar system exploration missions. These supervised classification models can discriminate between abiotic and biotic samples with ∼86–89% accuracy. We use and compare 4 different ML models, coupled with range of statistical and visualization methods, to investigate the patterns and distribution of diagnostic features— specific combinations of chromatographic retention time and mass‐to‐charge ratio, which contribute to the classification of the samples into biologically derived versus abiologically derived materials. These diagnostic discriminators are common in biotic samples and rare in most abiotic samples and hence point to a potential agnostic molecular biosignature. They also tend to have higher normalized intensity values in biologically derived materials and display different distributions in contemporary biotic samples compared to taphonomically altered biotic samples. We utilize the full resolution of the 3D structure of the py‐GC‐MS data and describe in detail the preprocessing steps and the ML pipeline for analyzing such data, which could be automated for future data collection.
Astrobiology and paleobiology are concerned with determining what distinguishes the chemistry of life from the chemistry of the nonliving world. We hypothesize that the diversity and distribution of molecules in biologically derived materials (e.g., plants, animal tissue, bacteria, and coal) are different than those in abiotic materials (e.g., carbon‐rich meteorites and laboratory‐made synthetic reactions). To test this hypothesis, we analyzed a diverse collection of natural and synthetic organic molecular mixtures using pyrolysis‐gas chromatography‐mass spectrometry (py‐GC‐MS)—a widely available analytical method that has been used in solar system exploration missions. In py‐GC‐MS, samples are heated, decomposed into smaller components, and separated into fragment ions for molecular identification. We train and compare several machine learning classification models to predict the biogenicity of the samples and to determine the patterns and distribution of features—specific combinations of chromatographic retention time and mass‐to‐charge ratio that are important for distinguishing biologically derived samples from abiotic ones. These diagnostic features are both more commonly present and occur in greater abundance in biotic samples than abiotic samples, and hence serve as potential molecular biosignatures.
Machine learning is applied to pyrolysis‐gas chromatography‐mass spectrometry to predict the biogenicity in various carbonaceous materials Diagnostic features for discriminating biologically derived samples from abiotic samples have been identified Potential molecular features identified as diagnostic biochemical discriminators are common in biotic samples and rare in most abiotic ones |
---|---|
ISSN: | 2993-5210 2993-5210 |
DOI: | 10.1029/2024JH000441 |