Improving Inference of Biochemical Composition in Marine Biomass via Genetic Algorithm-Based Feature Selection on Raman Spectroscopic Data

Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples. Raman spectroscopy is often applied to measure complex biological samples, thereby enabling rapid quality control by associating the spectra wi...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE Congress on Evolutionary Computation (CEC) pp. 01 - 08
Main Authors Demir, Kaan, Nguyen, Bach H., Rooney, Jeremy S., Xue, Bing, Zhang, Mengjie, Lagutin, Kirill, MacKenzie, Andrew, Gordon, Keith C., Killeen, Daniel P.
Format Conference Proceeding
LanguageEnglish
Published IEEE 30.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples. Raman spectroscopy is often applied to measure complex biological samples, thereby enabling rapid quality control by associating the spectra with biochemical reference data using methods such as partial least squares regression. However, a small number of samples, noisy or misleading signals, and collinearity, often seen in real-world spectroscopic data, can negatively impact the fitting quality and inference capability of partial least squares regression. Feature selection is widely used to select a small and informative subset of the original features that can improve modeling performance, however, this is not always easy to achieve especially due to the aforementioned issues inherent to spectroscopic data. We address these issues by proposing a Genetic Algorithm-based feature selection approach for spectroscopic data acquired from New Zealand hoki and mackerel species. First, we apply a mathematical correction to the Raman signal most suited for each target composition, thereby reducing the effect of noise, irrelevant optical artifacts, and misleading signals. Next, we carefully curate a cross-validated feature selection process to circumvent the low number of samples using a new representation and fitness function to reduce regression error and balance model complexity. Our findings indicate that the proposed method can improve the fitting quality and inference capability of partial least squares regression over using the full set spectroscopic data. Lastly, we analyse the density of selected features to highlight the most salient signals.
DOI:10.1109/CEC60901.2024.10612136