Predicting the performance of medium-chain carboxylic acid (MCCA) production using machine learning algorithms and microbial community data

The carboxylate platform-based bioprocess for medium-chain carboxylic acid (MCCA) production from waste biomass via mixed culture has been the subject of extensive research because of the high economic value of MCCA and potential environmental benefits. However, modeling the conversion process using...

Full description

Saved in:
Bibliographic Details
Published inJournal of cleaner production Vol. 377; p. 134223
Main Authors Long, Fei, Fan, Joshua, Xu, Weichao, Liu, Hong
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.12.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The carboxylate platform-based bioprocess for medium-chain carboxylic acid (MCCA) production from waste biomass via mixed culture has been the subject of extensive research because of the high economic value of MCCA and potential environmental benefits. However, modeling the conversion process using mechanistic models is challenging due to the complex and unclear interactions and metabolic pathways of the system. Herein, four data-driven machine learning (ML) algorithms, including random forest (RF), extreme gradient boosting (Xgboost), k-nearest neighbor (KNN), and artificial neural network (ANN), were employed to predict the MCCA concentration and production rate based on data (environmental and operational parameters and corresponding genomic data) collected under 94 experiment conditions from 8 research groups. It was found that all selected ML algorithms achieved prediction accuracy higher than 0.7 using operational parameters only. A significant improvement in the predictive efficacy (ranging from 0.83 to 0.87) was observed when incorporating the genomic data with environmental and operational parameters. The prediction of MCCA production by the random forest (RF) model had the highest prediction accuracy of 0.83, 0.87, and 0.89 when the operational parameters, genomic data, and combined dataset were used as input parameters, respectively. Hydraulic retention time (HRT) and organic loading rate (OLR) were identified as the dominant operational parameters that affect the MCCA concentration and rate based on the feature importance generated by RF. The key microbes that affected the MCCA concentration and MCCA production rate were different. Bacteroidales and Coriobacteriales were the only orders sensitive to both the MCCA concentration and rate, with feature importance weights of 6.71% and 6.97%, respectively, and could be potential universal biomarkers for process monitoring. The results demonstrated that the proposed ML models could be used as a means of simulating the carboxylate platform for MCCA production from waste feedstock, enhancing the understanding of the behavior of microorganisms in the process, and providing guidance for further optimization. [Display omitted] •A dataset comprising operational parameters and genomic data was developed.•Four ML algorithms have been used to predict the MCCA concentration and rate.•RF achieved the best performance (∼0.89) in the prediction of MCCA generation.•The effects of input parameters on model performance were fully studied.•The key parameters and microbial groups affecting MCCA generation were identified.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0959-6526
1879-1786
DOI:10.1016/j.jclepro.2022.134223