Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis

Lactococcus lactis strains are important components in industrial starter cultures for cheese manufacturing. They have many strain-dependent properties, which affect the final product. Here, we explored the use of machine learning to create systematic, high-throughput screening methods for these pro...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 16; no. 3; p. e0246287
Main Authors Karlsen, Signe Tang, Vesth, Tammi Camilla, Oregaard, Gunnar, Poulsen, Vera Kuzina, Lund, Ole, Henderson, Gemma, Bælum, Jacob
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 15.03.2021
Public Library of Science (PLoS)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Lactococcus lactis strains are important components in industrial starter cultures for cheese manufacturing. They have many strain-dependent properties, which affect the final product. Here, we explored the use of machine learning to create systematic, high-throughput screening methods for these properties. Fast acidification of milk is such a strain-dependent property. To predict the maximum hourly acidification rate (Vmax), we trained Random Forest (RF) models on four different genomic representations: Presence/absence of gene families, counts of Pfam domains, the 8 nucleotide long subsequences of their DNA (8-mers), and the 9 nucleotide long subsequences of their DNA (9-mers). Vmax was measured at different temperatures, volumes, and in the presence or absence of yeast extract. These conditions were added as features in each RF model. The four models were trained on 257 strains, and the correlation between the measured Vmax and the predicted Vmax was evaluated with Pearson Correlation Coefficients (PC) on a separate dataset of 85 strains. The models all had high PC scores: 0.83 (gene presence/absence model), 0.84 (Pfam domain model), 0.76 (8-mer model), and 0.85 (9-mer model). The models all based their predictions on relevant genetic features and showed consensus on systems for lactose metabolism, degradation of casein, and pH stress response. Each model also predicted a set of features not found by the other models.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Competing Interests: The authors have read the journal’s policy and have the following competing interests: SK, TV, GO, VP, GH, and JB were employed at Chr. Hansen A/S. Chr. Hansen A/S provided support in the form of salaries and a pre-existent dataset. This does not alter our adherence to PLOS ONE policies on sharing data and materials. There are no patents, products in development or marketed products associated with this research to declare.
Current address: Bacthera Denmark A/S, Hoersholm, Denmark
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0246287