Improving Inference of Biochemical Composition in Marine Biomass via Genetic Algorithm-Based Feature Selection on Raman Spectroscopic Data
Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples. Raman spectroscopy is often applied to measure complex biological samples, thereby enabling rapid quality control by associating the spectra wi...
Saved in:
Published in | 2024 IEEE Congress on Evolutionary Computation (CEC) pp. 01 - 08 |
---|---|
Main Authors | , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
30.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples. Raman spectroscopy is often applied to measure complex biological samples, thereby enabling rapid quality control by associating the spectra with biochemical reference data using methods such as partial least squares regression. However, a small number of samples, noisy or misleading signals, and collinearity, often seen in real-world spectroscopic data, can negatively impact the fitting quality and inference capability of partial least squares regression. Feature selection is widely used to select a small and informative subset of the original features that can improve modeling performance, however, this is not always easy to achieve especially due to the aforementioned issues inherent to spectroscopic data. We address these issues by proposing a Genetic Algorithm-based feature selection approach for spectroscopic data acquired from New Zealand hoki and mackerel species. First, we apply a mathematical correction to the Raman signal most suited for each target composition, thereby reducing the effect of noise, irrelevant optical artifacts, and misleading signals. Next, we carefully curate a cross-validated feature selection process to circumvent the low number of samples using a new representation and fitness function to reduce regression error and balance model complexity. Our findings indicate that the proposed method can improve the fitting quality and inference capability of partial least squares regression over using the full set spectroscopic data. Lastly, we analyse the density of selected features to highlight the most salient signals. |
---|---|
DOI: | 10.1109/CEC60901.2024.10612136 |