Improving Inference of Biochemical Composition in Marine Biomass via Genetic Algorithm-Based Feature Selection on Raman Spectroscopic Data
Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples. Raman spectroscopy is often applied to measure complex biological samples, thereby enabling rapid quality control by associating the spectra wi...
Saved in:
Published in | 2024 IEEE Congress on Evolutionary Computation (CEC) pp. 01 - 08 |
---|---|
Main Authors | , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
30.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples. Raman spectroscopy is often applied to measure complex biological samples, thereby enabling rapid quality control by associating the spectra with biochemical reference data using methods such as partial least squares regression. However, a small number of samples, noisy or misleading signals, and collinearity, often seen in real-world spectroscopic data, can negatively impact the fitting quality and inference capability of partial least squares regression. Feature selection is widely used to select a small and informative subset of the original features that can improve modeling performance, however, this is not always easy to achieve especially due to the aforementioned issues inherent to spectroscopic data. We address these issues by proposing a Genetic Algorithm-based feature selection approach for spectroscopic data acquired from New Zealand hoki and mackerel species. First, we apply a mathematical correction to the Raman signal most suited for each target composition, thereby reducing the effect of noise, irrelevant optical artifacts, and misleading signals. Next, we carefully curate a cross-validated feature selection process to circumvent the low number of samples using a new representation and fitness function to reduce regression error and balance model complexity. Our findings indicate that the proposed method can improve the fitting quality and inference capability of partial least squares regression over using the full set spectroscopic data. Lastly, we analyse the density of selected features to highlight the most salient signals. |
---|---|
AbstractList | Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples. Raman spectroscopy is often applied to measure complex biological samples, thereby enabling rapid quality control by associating the spectra with biochemical reference data using methods such as partial least squares regression. However, a small number of samples, noisy or misleading signals, and collinearity, often seen in real-world spectroscopic data, can negatively impact the fitting quality and inference capability of partial least squares regression. Feature selection is widely used to select a small and informative subset of the original features that can improve modeling performance, however, this is not always easy to achieve especially due to the aforementioned issues inherent to spectroscopic data. We address these issues by proposing a Genetic Algorithm-based feature selection approach for spectroscopic data acquired from New Zealand hoki and mackerel species. First, we apply a mathematical correction to the Raman signal most suited for each target composition, thereby reducing the effect of noise, irrelevant optical artifacts, and misleading signals. Next, we carefully curate a cross-validated feature selection process to circumvent the low number of samples using a new representation and fitness function to reduce regression error and balance model complexity. Our findings indicate that the proposed method can improve the fitting quality and inference capability of partial least squares regression over using the full set spectroscopic data. Lastly, we analyse the density of selected features to highlight the most salient signals. |
Author | Xue, Bing Zhang, Mengjie Demir, Kaan MacKenzie, Andrew Killeen, Daniel P. Gordon, Keith C. Nguyen, Bach H. Rooney, Jeremy S. Lagutin, Kirill |
Author_xml | – sequence: 1 givenname: Kaan surname: Demir fullname: Demir, Kaan email: demirkaan@ecs.vuw.ac.nz organization: Centre for Data Science and Artificial Intelligence & School of ECS, Victoria University of Wellington – sequence: 2 givenname: Bach H. surname: Nguyen fullname: Nguyen, Bach H. email: bach.nguyen@ecs.vuw.ac.nz organization: Centre for Data Science and Artificial Intelligence & School of ECS, Victoria University of Wellington – sequence: 3 givenname: Jeremy S. surname: Rooney fullname: Rooney, Jeremy S. email: jeremy.rooney@otago.ac.nz organization: University of Otago,Department of Chemistry,Dunedin,New Zealand – sequence: 4 givenname: Bing surname: Xue fullname: Xue, Bing email: bing.xue@ecs.vuw.ac.nz organization: Centre for Data Science and Artificial Intelligence & School of ECS, Victoria University of Wellington – sequence: 5 givenname: Mengjie surname: Zhang fullname: Zhang, Mengjie email: mengjie.zhang@ecs.vuw.ac.nz organization: Centre for Data Science and Artificial Intelligence & School of ECS, Victoria University of Wellington – sequence: 6 givenname: Kirill surname: Lagutin fullname: Lagutin, Kirill email: kirill.lagutin@callaghaninnovation.govt.nz organization: Callaghan Innovation,Lower Hutt,New Zealand – sequence: 7 givenname: Andrew surname: MacKenzie fullname: MacKenzie, Andrew email: andrew.mackenzie@callaghaninnovation.govt.nz organization: Callaghan Innovation,Lower Hutt,New Zealand – sequence: 8 givenname: Keith C. surname: Gordon fullname: Gordon, Keith C. email: keith.gordon@otago.ac.nz organization: University of Otago,Department of Chemistry,Dunedin,New Zealand – sequence: 9 givenname: Daniel P. surname: Killeen fullname: Killeen, Daniel P. email: daniel.killeen@plantandfood.co.nz organization: The New Zealand Institute for Plant and Food Research Limited,Nelson,New Zealand |
BookMark | eNqFj81OwzAQhI0EB376BgjtCzTYMQ3NkYa29MCFcq9WZtOuFHst21TiFXhqAoIz0kgjzcx3mAt1GiSQUjdGV8bo9rZbdo1utalqXd9VRjemNrY5UZP2vp3bmbZ6bpvZufrc-JjkyGEPm9BTouAIpIcFizuQZ4cDdOKjZC4sATjAMyYO9L3wmDMcGWFNgQo7eBj2krgc_HSBmd5gRVjeE8GWBnI__KgX9BhgG8ckSXYSR_ARC16psx6HTJNfv1TXq-Vr9zRlItrFxB7Tx-7viv2n_gJuOVQC |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/CEC60901.2024.10612136 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798350308365 |
EndPage | 08 |
ExternalDocumentID | 10612136 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-ieee_primary_106121363 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 14 05:40:31 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-ieee_primary_106121363 |
ParticipantIDs | ieee_primary_10612136 |
PublicationCentury | 2000 |
PublicationDate | 2024-June-30 |
PublicationDateYYYYMMDD | 2024-06-30 |
PublicationDate_xml | – month: 06 year: 2024 text: 2024-June-30 day: 30 |
PublicationDecade | 2020 |
PublicationTitle | 2024 IEEE Congress on Evolutionary Computation (CEC) |
PublicationTitleAbbrev | CEC |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 3.851313 |
Snippet | Assessing biochemical compositions of biomass from the fishing industry is challenging due to the complexity and seasonal variability of biological samples.... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 01 |
SubjectTerms | Biomass Complexity theory Feature extraction feature selection Fitting marine biomass Performance gain Quality control Raman scattering regression spectroscopy |
Title | Improving Inference of Biochemical Composition in Marine Biomass via Genetic Algorithm-Based Feature Selection on Raman Spectroscopic Data |
URI | https://ieeexplore.ieee.org/document/10612136 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEB1sT55UrPhRZQ5ek27zRXpsY0sVWsQP6K1sklkN2qSUxIM_wV_tzqatKApCDiHMJll2w5vdzHsP4FLplFoGPlnCiXsWI7Qldd5hiTDupnoGuGTcGybTYPzo3cz82ZqsbrgwRGSKz8jmU_MvPy2SirfKOrx8cbpu0IBGKJyarLVm_XZFrxMNo0BofNOrPsezN8HfbFMMaoz2YLp5Xl0s8mJXZWwn7z-kGP_9QvvQ-iLo4e0Weg5gh_JD-NhuEOD1NqpQOMjYFcvIAiB__usyLcxynEgm_3HEQifR-JZJZB1qPZmw__pUrLLyeWENNNClyLlitSK8N8Y53F4fd3Ihc2QP-5JVMYulbnglS9mC9mj4EI0t7sx8WStazDf9cI-gmRc5HQOSLxKZCBIqdD1fuNJT0nMokGGaKhWoE2j9eovTP66fwS4PS11q14ZmuaroXON5GV-YcfwELRenRA |
link.rule.ids | 310,311,786,790,795,796,802,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED1BGWACRBEfBTywJnXzpXSkpVULbYWgSN0qJ7EhgiZVlTDwE_jV3DlJEQgkpAxJdLZj2fI7O_fuAVwqdKmF50qDW0HbIIQ2BPodBveDVoQzwJZavWE88QaPzs3MnZVkdc2FkVLq4DNp0q3-lx-lYU5HZU3avlgt29uELQR63i7oWiXvF5-b3V7X44hwuO-zHLMy_yaconGjvwuTqsUiXOTFzLPADN9_JGP89yftQf2Losfu1uCzDxsyOYCP9REBG66tUsU6Meli6cQAjBaAMlCLxQkbC6L_kcUC3Wj2FgtGmahxOrGr16d0FWfPC6ODUBcx8hbzlWQPWjqHyuN1LxYiYaRin1FezHSJBa9FJurQ6Pem3YFBnZkvi5wW86of9iHUkjSRR8Cky0MRcsmVbzsut4WjhGNJT_hRpJSnjqH-axUnf7y_gO3BdDyaj4aT21PYoSEqAu8aUMtWuTxDdM-Ccz2mnyX9qpo |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE+Congress+on+Evolutionary+Computation+%28CEC%29&rft.atitle=Improving+Inference+of+Biochemical+Composition+in+Marine+Biomass+via+Genetic+Algorithm-Based+Feature+Selection+on+Raman+Spectroscopic+Data&rft.au=Demir%2C+Kaan&rft.au=Nguyen%2C+Bach+H.&rft.au=Rooney%2C+Jeremy+S.&rft.au=Xue%2C+Bing&rft.date=2024-06-30&rft.pub=IEEE&rft.spage=01&rft.epage=08&rft_id=info:doi/10.1109%2FCEC60901.2024.10612136&rft.externalDocID=10612136 |