Improving Inference of Biochemical Composition in Marine Biomass via Genetic Algorithm-Based Feature Selection on Raman Spectroscopic Data

Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples. Raman spectroscopy is often applied to measure complex biological samples, thereby enabling rapid quality control by associating the spectra wi...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE Congress on Evolutionary Computation (CEC) pp. 01 - 08
Main Authors Demir, Kaan, Nguyen, Bach H., Rooney, Jeremy S., Xue, Bing, Zhang, Mengjie, Lagutin, Kirill, MacKenzie, Andrew, Gordon, Keith C., Killeen, Daniel P.
Format Conference Proceeding
LanguageEnglish
Published IEEE 30.06.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples. Raman spectroscopy is often applied to measure complex biological samples, thereby enabling rapid quality control by associating the spectra with biochemical reference data using methods such as partial least squares regression. However, a small number of samples, noisy or misleading signals, and collinearity, often seen in real-world spectroscopic data, can negatively impact the fitting quality and inference capability of partial least squares regression. Feature selection is widely used to select a small and informative subset of the original features that can improve modeling performance, however, this is not always easy to achieve especially due to the aforementioned issues inherent to spectroscopic data. We address these issues by proposing a Genetic Algorithm-based feature selection approach for spectroscopic data acquired from New Zealand hoki and mackerel species. First, we apply a mathematical correction to the Raman signal most suited for each target composition, thereby reducing the effect of noise, irrelevant optical artifacts, and misleading signals. Next, we carefully curate a cross-validated feature selection process to circumvent the low number of samples using a new representation and fitness function to reduce regression error and balance model complexity. Our findings indicate that the proposed method can improve the fitting quality and inference capability of partial least squares regression over using the full set spectroscopic data. Lastly, we analyse the density of selected features to highlight the most salient signals.
AbstractList Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples. Raman spectroscopy is often applied to measure complex biological samples, thereby enabling rapid quality control by associating the spectra with biochemical reference data using methods such as partial least squares regression. However, a small number of samples, noisy or misleading signals, and collinearity, often seen in real-world spectroscopic data, can negatively impact the fitting quality and inference capability of partial least squares regression. Feature selection is widely used to select a small and informative subset of the original features that can improve modeling performance, however, this is not always easy to achieve especially due to the aforementioned issues inherent to spectroscopic data. We address these issues by proposing a Genetic Algorithm-based feature selection approach for spectroscopic data acquired from New Zealand hoki and mackerel species. First, we apply a mathematical correction to the Raman signal most suited for each target composition, thereby reducing the effect of noise, irrelevant optical artifacts, and misleading signals. Next, we carefully curate a cross-validated feature selection process to circumvent the low number of samples using a new representation and fitness function to reduce regression error and balance model complexity. Our findings indicate that the proposed method can improve the fitting quality and inference capability of partial least squares regression over using the full set spectroscopic data. Lastly, we analyse the density of selected features to highlight the most salient signals.
Author Xue, Bing
Zhang, Mengjie
Demir, Kaan
MacKenzie, Andrew
Killeen, Daniel P.
Gordon, Keith C.
Nguyen, Bach H.
Rooney, Jeremy S.
Lagutin, Kirill
Author_xml – sequence: 1
  givenname: Kaan
  surname: Demir
  fullname: Demir, Kaan
  email: demirkaan@ecs.vuw.ac.nz
  organization: Centre for Data Science and Artificial Intelligence & School of ECS, Victoria University of Wellington
– sequence: 2
  givenname: Bach H.
  surname: Nguyen
  fullname: Nguyen, Bach H.
  email: bach.nguyen@ecs.vuw.ac.nz
  organization: Centre for Data Science and Artificial Intelligence & School of ECS, Victoria University of Wellington
– sequence: 3
  givenname: Jeremy S.
  surname: Rooney
  fullname: Rooney, Jeremy S.
  email: jeremy.rooney@otago.ac.nz
  organization: University of Otago,Department of Chemistry,Dunedin,New Zealand
– sequence: 4
  givenname: Bing
  surname: Xue
  fullname: Xue, Bing
  email: bing.xue@ecs.vuw.ac.nz
  organization: Centre for Data Science and Artificial Intelligence & School of ECS, Victoria University of Wellington
– sequence: 5
  givenname: Mengjie
  surname: Zhang
  fullname: Zhang, Mengjie
  email: mengjie.zhang@ecs.vuw.ac.nz
  organization: Centre for Data Science and Artificial Intelligence & School of ECS, Victoria University of Wellington
– sequence: 6
  givenname: Kirill
  surname: Lagutin
  fullname: Lagutin, Kirill
  email: kirill.lagutin@callaghaninnovation.govt.nz
  organization: Callaghan Innovation,Lower Hutt,New Zealand
– sequence: 7
  givenname: Andrew
  surname: MacKenzie
  fullname: MacKenzie, Andrew
  email: andrew.mackenzie@callaghaninnovation.govt.nz
  organization: Callaghan Innovation,Lower Hutt,New Zealand
– sequence: 8
  givenname: Keith C.
  surname: Gordon
  fullname: Gordon, Keith C.
  email: keith.gordon@otago.ac.nz
  organization: University of Otago,Department of Chemistry,Dunedin,New Zealand
– sequence: 9
  givenname: Daniel P.
  surname: Killeen
  fullname: Killeen, Daniel P.
  email: daniel.killeen@plantandfood.co.nz
  organization: The New Zealand Institute for Plant and Food Research Limited,Nelson,New Zealand
BookMark eNqFj81OwzAQhI0EB376BgjtCzTYMQ3NkYa29MCFcq9WZtOuFHst21TiFXhqAoIz0kgjzcx3mAt1GiSQUjdGV8bo9rZbdo1utalqXd9VRjemNrY5UZP2vp3bmbZ6bpvZufrc-JjkyGEPm9BTouAIpIcFizuQZ4cDdOKjZC4sATjAMyYO9L3wmDMcGWFNgQo7eBj2krgc_HSBmd5gRVjeE8GWBnI__KgX9BhgG8ckSXYSR_ARC16psx6HTJNfv1TXq-Vr9zRlItrFxB7Tx-7viv2n_gJuOVQC
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CEC60901.2024.10612136
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore Digital Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350308365
EndPage 08
ExternalDocumentID 10612136
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-ieee_primary_106121363
IEDL.DBID RIE
IngestDate Wed Aug 14 05:40:31 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-ieee_primary_106121363
ParticipantIDs ieee_primary_10612136
PublicationCentury 2000
PublicationDate 2024-June-30
PublicationDateYYYYMMDD 2024-06-30
PublicationDate_xml – month: 06
  year: 2024
  text: 2024-June-30
  day: 30
PublicationDecade 2020
PublicationTitle 2024 IEEE Congress on Evolutionary Computation (CEC)
PublicationTitleAbbrev CEC
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
Score 3.851313
Snippet Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples....
SourceID ieee
SourceType Publisher
StartPage 01
SubjectTerms Biomass
Complexity theory
Feature extraction
feature selection
Fitting
marine biomass
Performance gain
Quality control
Raman scattering
regression
spectroscopy
Title Improving Inference of Biochemical Composition in Marine Biomass via Genetic Algorithm-Based Feature Selection on Raman Spectroscopic Data
URI https://ieeexplore.ieee.org/document/10612136
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEB1sT55UrPhRZQ5ek27zRXpsY0sVWsQP6K1sklkN2qSUxIM_wV_tzqatKApCDiHMJll2w5vdzHsP4FLplFoGPlnCiXsWI7Qldd5hiTDupnoGuGTcGybTYPzo3cz82ZqsbrgwRGSKz8jmU_MvPy2SirfKOrx8cbpu0IBGKJyarLVm_XZFrxMNo0BofNOrPsezN8HfbFMMaoz2YLp5Xl0s8mJXZWwn7z-kGP_9QvvQ-iLo4e0Weg5gh_JD-NhuEOD1NqpQOMjYFcvIAiB__usyLcxynEgm_3HEQifR-JZJZB1qPZmw__pUrLLyeWENNNClyLlitSK8N8Y53F4fd3Ihc2QP-5JVMYulbnglS9mC9mj4EI0t7sx8WStazDf9cI-gmRc5HQOSLxKZCBIqdD1fuNJT0nMokGGaKhWoE2j9eovTP66fwS4PS11q14ZmuaroXON5GV-YcfwELRenRA
link.rule.ids 310,311,786,790,795,796,802,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED1BGWACRBEfBTywJnXzpXSkpVULbYWgSN0qJ7EhgiZVlTDwE_jV3DlJEQgkpAxJdLZj2fI7O_fuAVwqdKmF50qDW0HbIIQ2BPodBveDVoQzwJZavWE88QaPzs3MnZVkdc2FkVLq4DNp0q3-lx-lYU5HZU3avlgt29uELQR63i7oWiXvF5-b3V7X44hwuO-zHLMy_yaconGjvwuTqsUiXOTFzLPADN9_JGP89yftQf2Losfu1uCzDxsyOYCP9REBG66tUsU6Meli6cQAjBaAMlCLxQkbC6L_kcUC3Wj2FgtGmahxOrGr16d0FWfPC6ODUBcx8hbzlWQPWjqHyuN1LxYiYaRin1FezHSJBa9FJurQ6Pem3YFBnZkvi5wW86of9iHUkjSRR8Cky0MRcsmVbzsut4WjhGNJT_hRpJSnjqH-axUnf7y_gO3BdDyaj4aT21PYoSEqAu8aUMtWuTxDdM-Ccz2mnyX9qpo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE+Congress+on+Evolutionary+Computation+%28CEC%29&rft.atitle=Improving+Inference+of+Biochemical+Composition+in+Marine+Biomass+via+Genetic+Algorithm-Based+Feature+Selection+on+Raman+Spectroscopic+Data&rft.au=Demir%2C+Kaan&rft.au=Nguyen%2C+Bach+H.&rft.au=Rooney%2C+Jeremy+S.&rft.au=Xue%2C+Bing&rft.date=2024-06-30&rft.pub=IEEE&rft.spage=01&rft.epage=08&rft_id=info:doi/10.1109%2FCEC60901.2024.10612136&rft.externalDocID=10612136