Big Data Integration in Genomic Analysis: Applications to Genetically Modified Corn (Zea Mays)

This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for qu...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of big data intelligence and applications Vol. 6; no. 1; pp. 1 - 30
Main Authors Segall, Richard S, Rajbhandari, Prasanna
Format Journal Article
LanguageEnglish
Published Dallas IGI Global 12.08.2025
Subjects
Online AccessGet full text
ISSN2644-1675
2644-1683
DOI10.4018/IJBDIA.387389

Cover

Abstract This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for quality control and visualization. Key metrics are visualized with Matplotlib and Seaborn software, providing insights into sample quality, contamination risks, and optimal trimming strategies. Portions of datasets containing over a million genomic values for genetically modified corn (Zea Mays) are used to illustrate practical challenges such as read variability and quality degradation, supporting the use of quality-based filtering. Related work on big data, genomics, and genomic tools and pipelines is also reviewed.
AbstractList This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for quality control and visualization. Key metrics are visualized with Matplotlib and Seaborn software, providing insights into sample quality, contamination risks, and optimal trimming strategies. Portions of datasets containing over a million genomic values for genetically modified corn (Zea Mays) are used to illustrate practical challenges such as read variability and quality degradation, supporting the use of quality-based filtering. Related work on big data, genomics, and genomic tools and pipelines is also reviewed.
Author Segall, Richard S
Rajbhandari, Prasanna
AuthorAffiliation Arkansas State University, USA
AuthorAffiliation_xml – name: Arkansas State University, USA
Author_xml – sequence: 1
  givenname: Richard
  surname: Segall
  middlename: S
  fullname: Segall, Richard S
  organization: Arkansas State University, USA
– sequence: 2
  givenname: Prasanna
  surname: Rajbhandari
  fullname: Rajbhandari, Prasanna
  organization: Arkansas State University, USA
BookMark eNptkLFOwzAURS0EEqV0ZLfEAkOKYzu2w9amUIJascDCgJU4TuUqtUOcDvl70gYBA9O7ejq6ujoX4NQ6qwG4CtGUolDcpc_zRTqbEsGJiE_ACDNKg5AJcvqTeXQOJt5vEUIYh4QLMQIfc7OBi6zNYGpbvWmy1jgLjYVLbd3OKDizWdV54-_hrK4ro46Ah607ELrtH1XVwbUrTGl0ARPXWHjzrjO4zjp_ewnOyqzyevJ9x-Dt8eE1eQpWL8s0ma0CFfbjgogyRbWOlOC8xDlVgomcFqyIWEkIjXMR5TjiEeJYCKVooTAqCqoYwjFHXJAxuB5668Z97rVv5dbtm366lwSTWCAmMO2pYKBU47xvdCnrxuyyppMhkgeLcrAoB4s9nwy82ZjfwoMt-ceWNFYebf1bwsgXj7l7RQ
Cites_doi 10.1007/978-1-4939-3167-5_9
10.4018/978-1-6684-3662-2.ch002
10.4018/978-1-59140-557-3.ch140
10.4161/fly.19695
10.5772/intechopen.71349
10.1093/bioinformatics/btt656
10.1093/bioinformatics/btu170
10.1002/9781119792673
10.1093/bioinformatics/btp352
10.3390/ijms23094645
10.1201/9781351172646
10.1093/nar/gkg115
10.1093/bioinformatics/btq033
10.1007/978-981-16-9158-4
10.1093/nar/gky955
10.1093/bioinformatics/btp616
10.4018/IJCVIP.353913
10.1186/s12864-018-4665-2
10.1142/10053
10.1038/nrg3920
10.1038/s41467-020-16888-y
10.1186/s13059-014-0550-8
10.1007/978-981-16-3607-3_1
10.4018/978-1-5225-8903-7.ch032
10.1007/978-3-319-41279-5
10.4018/978-1-5225-3142-5
10.54808/JSCI.22.07.6
10.1038/nbt.3820
10.1186/s12864-015-1911-8
10.1142/12425
10.1093/nar/gkac247
10.1093/bioinformatics/bts635
10.4018/978-1-7998-2768-9
10.1093/jambio/lxac055
10.4018/978-1-5225-3142-5.ch001
10.1007/978-981-99-6913-5
10.14806/ej.17.1.200
10.4018/IJARB.361940
10.1038/nmeth.4197
10.1016/j.parkreldis.2025.107311
10.1007/978-3-319-28422-4
10.1101/gr.213611.116
10.4018/978-1-7998-2768-9.ch002
ContentType Journal Article
Copyright 2025. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2025. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
8FE
8FG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
DOI 10.4018/IJBDIA.387389
DatabaseName CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection (via ProQuest SciTech Premium Collection)
ProQuest One Community College
ProQuest Central Korea
ProQuest Central Student
ProQuest SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
ProQuest Advanced Technologies & Aerospace Collection
Proquest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
DatabaseTitle CrossRef
Advanced Technologies & Aerospace Collection
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest One Academic Eastern Edition
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList Advanced Technologies & Aerospace Collection

CrossRef
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2644-1683
EndPage 30
ExternalDocumentID 10_4018_IJBDIA_387389
Data_Integration_in_Genom10_4018_IJBDIA_3873896
GroupedDBID ABGRR
ABPHS
ACOJC
ADEKF
AFKRA
ALMA_UNASSIGNED_HOLDINGS
ARAPS
BENPR
BGLVJ
CCPQU
EBS
EJD
HCIFZ
K7-
OK1
PHGZM
PHGZT
PQGLB
AAYXX
CITATION
8FE
8FG
AZQEC
DWQXO
GNUQQ
JQ2
P62
PKEHL
PQEST
PQQKQ
PQUKI
PUEGO
ID FETCH-LOGICAL-c1264-546c4ee5c877f2b4c868b4d6d56f3349b85b257507288cc4dc20dd4c602970783
IEDL.DBID 8FG
ISSN 2644-1675
IngestDate Sat Aug 23 13:31:39 EDT 2025
Thu Aug 14 00:09:26 EDT 2025
Sat Aug 16 04:10:36 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 1
Language English
License http://creativecommons.org/licenses/by/3.0/deed.en_US
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1264-546c4ee5c877f2b4c868b4d6d56f3349b85b257507288cc4dc20dd4c602970783
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0009-0009-3394-2137
0000-0001-7627-2609
PQID 3239806824
PQPubID 6692855
PageCount 30
ParticipantIDs igi_journals_Data_Integration_in_Genom10_4018_IJBDIA_3873896
proquest_journals_3239806824
crossref_primary_10_4018_IJBDIA_387389
PublicationCentury 2000
PublicationDate 2025-08-12T00:00:00
PublicationDateYYYYMMDD 2025-08-12
PublicationDate_xml – month: 08
  year: 2025
  text: 2025-08-12T00:00:00
  day: 12
PublicationDecade 2020
PublicationPlace Dallas
PublicationPlace_xml – name: Dallas
PublicationTitle International journal of big data intelligence and applications
PublicationYear 2025
Publisher IGI Global
Publisher_xml – name: IGI Global
References IJBDIA.387389-28
IJBDIA.387389-29
IJBDIA.387389-26
IJBDIA.387389-27
IJBDIA.387389-24
(IJBDIA.387389-4) 2025
IJBDIA.387389-25
IJBDIA.387389-22
IJBDIA.387389-23
IJBDIA.387389-20
IJBDIA.387389-21
B.Babita Pandey (IJBDIA.387389-3) 2025
IJBDIA.387389-19
IJBDIA.387389-17
IJBDIA.387389-18
IJBDIA.387389-15
IJBDIA.387389-16
IJBDIA.387389-13
IJBDIA.387389-57
IJBDIA.387389-14
IJBDIA.387389-55
IJBDIA.387389-12
IJBDIA.387389-56
IJBDIA.387389-53
IJBDIA.387389-10
IJBDIA.387389-54
IJBDIA.387389-51
IJBDIA.387389-52
IJBDIA.387389-50
IJBDIA.387389-2
IJBDIA.387389-0
IJBDIA.387389-1
IJBDIA.387389-6
IJBDIA.387389-7
IJBDIA.387389-5
IJBDIA.387389-48
IJBDIA.387389-49
IJBDIA.387389-46
IJBDIA.387389-47
IJBDIA.387389-44
IJBDIA.387389-45
IJBDIA.387389-42
IJBDIA.387389-43
IJBDIA.387389-8
IJBDIA.387389-40
IJBDIA.387389-9
IJBDIA.387389-41
U. K.Devisetty (IJBDIA.387389-11) 2023
IJBDIA.387389-39
IJBDIA.387389-37
IJBDIA.387389-38
IJBDIA.387389-35
IJBDIA.387389-36
IJBDIA.387389-33
IJBDIA.387389-34
IJBDIA.387389-31
IJBDIA.387389-32
IJBDIA.387389-30
References_xml – ident: IJBDIA.387389-16
  doi: 10.1007/978-1-4939-3167-5_9
– ident: IJBDIA.387389-18
– ident: IJBDIA.387389-44
  doi: 10.4018/978-1-6684-3662-2.ch002
– ident: IJBDIA.387389-24
– ident: IJBDIA.387389-38
  doi: 10.4018/978-1-59140-557-3.ch140
– ident: IJBDIA.387389-47
– ident: IJBDIA.387389-7
  doi: 10.4161/fly.19695
– ident: IJBDIA.387389-0
  doi: 10.5772/intechopen.71349
– ident: IJBDIA.387389-5
– ident: IJBDIA.387389-1
– ident: IJBDIA.387389-21
  doi: 10.1093/bioinformatics/btt656
– ident: IJBDIA.387389-6
  doi: 10.1093/bioinformatics/btu170
– ident: IJBDIA.387389-10
– ident: IJBDIA.387389-51
  doi: 10.1002/9781119792673
– ident: IJBDIA.387389-20
  doi: 10.1093/bioinformatics/btp352
– ident: IJBDIA.387389-17
  doi: 10.3390/ijms23094645
– ident: IJBDIA.387389-57
  doi: 10.1201/9781351172646
– ident: IJBDIA.387389-28
– ident: IJBDIA.387389-52
  doi: 10.1093/nar/gkg115
– ident: IJBDIA.387389-32
  doi: 10.1093/bioinformatics/btq033
– ident: IJBDIA.387389-14
– ident: IJBDIA.387389-34
  doi: 10.1007/978-981-16-9158-4
– ident: IJBDIA.387389-13
  doi: 10.1093/nar/gky955
– ident: IJBDIA.387389-33
  doi: 10.1093/bioinformatics/btp616
– ident: IJBDIA.387389-45
  doi: 10.4018/IJCVIP.353913
– ident: IJBDIA.387389-49
  doi: 10.1186/s12864-018-4665-2
– ident: IJBDIA.387389-27
  doi: 10.1142/10053
– ident: IJBDIA.387389-22
  doi: 10.1038/nrg3920
– year: 2023
  ident: IJBDIA.387389-11
  publication-title: Deep learning for genomics: Data-driven approaches for genomics applications in life sciences and biotechnology
– ident: IJBDIA.387389-37
  doi: 10.1038/s41467-020-16888-y
– ident: IJBDIA.387389-29
– ident: IJBDIA.387389-23
  doi: 10.1186/s13059-014-0550-8
– ident: IJBDIA.387389-15
  doi: 10.1007/978-981-16-3607-3_1
– ident: IJBDIA.387389-19
  doi: 10.4018/978-1-5225-8903-7.ch032
– ident: IJBDIA.387389-56
  doi: 10.1007/978-3-319-41279-5
– year: 2025
  ident: IJBDIA.387389-4
– ident: IJBDIA.387389-41
  doi: 10.4018/978-1-5225-3142-5
– ident: IJBDIA.387389-26
– ident: IJBDIA.387389-40
  doi: 10.54808/JSCI.22.07.6
– ident: IJBDIA.387389-53
  doi: 10.1038/nbt.3820
– ident: IJBDIA.387389-48
  doi: 10.1186/s12864-015-1911-8
– ident: IJBDIA.387389-54
  doi: 10.1142/12425
– ident: IJBDIA.387389-9
  doi: 10.1093/nar/gkac247
– ident: IJBDIA.387389-12
  doi: 10.1093/bioinformatics/bts635
– ident: IJBDIA.387389-43
  doi: 10.4018/978-1-7998-2768-9
– ident: IJBDIA.387389-55
  doi: 10.1093/jambio/lxac055
– ident: IJBDIA.387389-42
  doi: 10.4018/978-1-5225-3142-5.ch001
– ident: IJBDIA.387389-2
  doi: 10.1007/978-981-99-6913-5
– ident: IJBDIA.387389-25
  doi: 10.14806/ej.17.1.200
– ident: IJBDIA.387389-46
  doi: 10.4018/IJARB.361940
– ident: IJBDIA.387389-50
– ident: IJBDIA.387389-31
  doi: 10.1038/nmeth.4197
– year: 2025
  ident: IJBDIA.387389-3
  publication-title: Computational intelligence for genomics data
– ident: IJBDIA.387389-35
  doi: 10.1016/j.parkreldis.2025.107311
– ident: IJBDIA.387389-8
  doi: 10.1007/978-3-319-28422-4
– ident: IJBDIA.387389-36
  doi: 10.1101/gr.213611.116
– ident: IJBDIA.387389-30
– ident: IJBDIA.387389-39
  doi: 10.4018/978-1-7998-2768-9.ch002
SSID ssj0002213788
Score 2.2993858
Snippet This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable...
SourceID proquest
crossref
igi
SourceType Aggregation Database
Index Database
Publisher
StartPage 1
SubjectTerms Big Data
Corn
Data integration
Genetic modification
Quality control
Title Big Data Integration in Genomic Analysis: Applications to Genetically Modified Corn (Zea Mays)
URI http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJBDIA.387389
https://www.proquest.com/docview/3239806824
Volume 6
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3LS8MwHA5uXrz4Fqdz5CCih7g-fk0zEWRT9xAmIgriwdAm6ShIO7d62H9v0ocOFG-FPg5ffu8034fQsYogsAM3IFJISkCBTwJ9SRS1VEeAb6lcY2l8T4fPcPfivZQDt3n5W2UVE_NALVNhZuRt1xDVWZQ5cDX9IEY1yuyulhIaNbRq60xj7Jz1B98zFsexDV260ZfTaZ_YujguaDZ1U8Hao7vezah77jLfNSLvS2mpFk_iX7E5Tzj9TbReVoq4WyztFlpRyTbaqFQYcOmUO-itF0_wTZAFeFRSP2iocZzggcqPHOOKd-QCd5d2q3GWmieKM4zvCzxOZRzpehRfp7MEn76qAI-DxfxsFz33b5-uh6RUTSBCYwDEAypAKU8w34-cEASjLARJpUcj14VOyLxQ-6muAx3GhAApHEtKENTIWJlNvT1UT9JE7SOsXKHbPxFBx7VAhrq47EDIGPVsJpiODg10UoHGpwU5BtdNhUGXF-jyAt0GutSQ8tI95tyAwpdA4XHCc1D-fJs2ULNai59v_BjEwf-3D9GaY0R7DY-t00T1bPapjnQlkYWt3FxaaLV3e__w-AW_Q8YJ
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB212wNcgBZQF_rhQ0FwME0cx_FWVNVut2XTdlcItVLFATexnSoSSkp3Edo_xW9knI92JRC33iIl8eHNZOaNnZkHsGMznvhJkFCjjaDc8ogmeEmt8GxP88izlcbSeCJGF_zkMrxcgt9tL4z7rbKNiVWgNqV2e-S7gRtU5wnJ-MHND-pUo9zpaiuhUbvFqZ3_wpJtuh8P0b5vGDs-Oj8c0UZVgGofsz8NudDc2lDLKMpYyrUUMuVGmFBkQcB7qQxT9GPkSUxKrbnRzDOGa-FkntyhF667DCvcdbR2YGVwNPn85W5XhzHfDWh3inZINKiPdLwe7IlljNyNTwbDuP8hkFHgZOUXEuFyfp3_lQ2qFHf8DJ403JT0a2dahSVbrMHTVveBNGHgOXwb5NdkmMwSEjfDJtC4JC_IJ1s1OZN20ske6S-cj5NZ6Z6ouya_z8m4NHmGDJgclrcFeffVJmSczKfvX8DFgyD6EjpFWdh1IDbQWHDqjPcCj5sU6WyPp1KK0JdaYjzqwtsWNHVTj-NQWMY4dFWNrqrR7cJHhFQ1H-RUOVDUAigqL1QFyj_fFl3YaG1xv8a9C776_-1teDQ6H5-ps3hy-hoeMycZ7Kbosg3ozG5_2k3kMbN0q3EeAlcP7a9_AI2VAM0
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Big+Data+Integration+in+Genomic+Analysis%3A+Applications+to+Genetically+Modified+Corn+%28Zea+Mays%29&rft.jtitle=International+journal+of+big+data+intelligence+and+applications&rft.au=Segall%2C+Richard&rft.au=Rajbhandari%2C+Prasanna&rft.date=2025-08-12&rft.pub=IGI+Global&rft.issn=2644-1675&rft.eissn=2644-1683&rft.volume=6&rft.issue=1&rft.spage=1&rft.epage=30&rft_id=info:doi/10.4018%2FIJBDIA.387389
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2644-1675&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2644-1675&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2644-1675&client=summon