Big Data Integration in Genomic Analysis: Applications to Genetically Modified Corn (Zea Mays)
This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for qu...
Saved in:
Published in | International journal of big data intelligence and applications Vol. 6; no. 1; pp. 1 - 30 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Dallas
IGI Global
12.08.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2644-1675 2644-1683 |
DOI | 10.4018/IJBDIA.387389 |
Cover
Abstract | This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for quality control and visualization. Key metrics are visualized with Matplotlib and Seaborn software, providing insights into sample quality, contamination risks, and optimal trimming strategies. Portions of datasets containing over a million genomic values for genetically modified corn (Zea Mays) are used to illustrate practical challenges such as read variability and quality degradation, supporting the use of quality-based filtering. Related work on big data, genomics, and genomic tools and pipelines is also reviewed. |
---|---|
AbstractList | This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for quality control and visualization. Key metrics are visualized with Matplotlib and Seaborn software, providing insights into sample quality, contamination risks, and optimal trimming strategies. Portions of datasets containing over a million genomic values for genetically modified corn (Zea Mays) are used to illustrate practical challenges such as read variability and quality degradation, supporting the use of quality-based filtering. Related work on big data, genomics, and genomic tools and pipelines is also reviewed. |
Author | Segall, Richard S Rajbhandari, Prasanna |
AuthorAffiliation | Arkansas State University, USA |
AuthorAffiliation_xml | – name: Arkansas State University, USA |
Author_xml | – sequence: 1 givenname: Richard surname: Segall middlename: S fullname: Segall, Richard S organization: Arkansas State University, USA – sequence: 2 givenname: Prasanna surname: Rajbhandari fullname: Rajbhandari, Prasanna organization: Arkansas State University, USA |
BookMark | eNptkLFOwzAURS0EEqV0ZLfEAkOKYzu2w9amUIJascDCgJU4TuUqtUOcDvl70gYBA9O7ejq6ujoX4NQ6qwG4CtGUolDcpc_zRTqbEsGJiE_ACDNKg5AJcvqTeXQOJt5vEUIYh4QLMQIfc7OBi6zNYGpbvWmy1jgLjYVLbd3OKDizWdV54-_hrK4ro46Ah607ELrtH1XVwbUrTGl0ARPXWHjzrjO4zjp_ewnOyqzyevJ9x-Dt8eE1eQpWL8s0ma0CFfbjgogyRbWOlOC8xDlVgomcFqyIWEkIjXMR5TjiEeJYCKVooTAqCqoYwjFHXJAxuB5668Z97rVv5dbtm366lwSTWCAmMO2pYKBU47xvdCnrxuyyppMhkgeLcrAoB4s9nwy82ZjfwoMt-ceWNFYebf1bwsgXj7l7RQ |
Cites_doi | 10.1007/978-1-4939-3167-5_9 10.4018/978-1-6684-3662-2.ch002 10.4018/978-1-59140-557-3.ch140 10.4161/fly.19695 10.5772/intechopen.71349 10.1093/bioinformatics/btt656 10.1093/bioinformatics/btu170 10.1002/9781119792673 10.1093/bioinformatics/btp352 10.3390/ijms23094645 10.1201/9781351172646 10.1093/nar/gkg115 10.1093/bioinformatics/btq033 10.1007/978-981-16-9158-4 10.1093/nar/gky955 10.1093/bioinformatics/btp616 10.4018/IJCVIP.353913 10.1186/s12864-018-4665-2 10.1142/10053 10.1038/nrg3920 10.1038/s41467-020-16888-y 10.1186/s13059-014-0550-8 10.1007/978-981-16-3607-3_1 10.4018/978-1-5225-8903-7.ch032 10.1007/978-3-319-41279-5 10.4018/978-1-5225-3142-5 10.54808/JSCI.22.07.6 10.1038/nbt.3820 10.1186/s12864-015-1911-8 10.1142/12425 10.1093/nar/gkac247 10.1093/bioinformatics/bts635 10.4018/978-1-7998-2768-9 10.1093/jambio/lxac055 10.4018/978-1-5225-3142-5.ch001 10.1007/978-981-99-6913-5 10.14806/ej.17.1.200 10.4018/IJARB.361940 10.1038/nmeth.4197 10.1016/j.parkreldis.2025.107311 10.1007/978-3-319-28422-4 10.1101/gr.213611.116 10.4018/978-1-7998-2768-9.ch002 |
ContentType | Journal Article |
Copyright | 2025. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2025. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION 8FE 8FG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI |
DOI | 10.4018/IJBDIA.387389 |
DatabaseName | CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Technology Collection (via ProQuest SciTech Premium Collection) ProQuest One Community College ProQuest Central Korea ProQuest Central Student ProQuest SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database ProQuest Advanced Technologies & Aerospace Collection Proquest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition |
DatabaseTitle | CrossRef Advanced Technologies & Aerospace Collection Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest One Academic Eastern Edition SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
DatabaseTitleList | Advanced Technologies & Aerospace Collection CrossRef |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 2644-1683 |
EndPage | 30 |
ExternalDocumentID | 10_4018_IJBDIA_387389 Data_Integration_in_Genom10_4018_IJBDIA_3873896 |
GroupedDBID | ABGRR ABPHS ACOJC ADEKF AFKRA ALMA_UNASSIGNED_HOLDINGS ARAPS BENPR BGLVJ CCPQU EBS EJD HCIFZ K7- OK1 PHGZM PHGZT PQGLB AAYXX CITATION 8FE 8FG AZQEC DWQXO GNUQQ JQ2 P62 PKEHL PQEST PQQKQ PQUKI PUEGO |
ID | FETCH-LOGICAL-c1264-546c4ee5c877f2b4c868b4d6d56f3349b85b257507288cc4dc20dd4c602970783 |
IEDL.DBID | 8FG |
ISSN | 2644-1675 |
IngestDate | Sat Aug 23 13:31:39 EDT 2025 Thu Aug 14 00:09:26 EDT 2025 Sat Aug 16 04:10:36 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 1 |
Language | English |
License | http://creativecommons.org/licenses/by/3.0/deed.en_US |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c1264-546c4ee5c877f2b4c868b4d6d56f3349b85b257507288cc4dc20dd4c602970783 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0009-0009-3394-2137 0000-0001-7627-2609 |
PQID | 3239806824 |
PQPubID | 6692855 |
PageCount | 30 |
ParticipantIDs | igi_journals_Data_Integration_in_Genom10_4018_IJBDIA_3873896 proquest_journals_3239806824 crossref_primary_10_4018_IJBDIA_387389 |
PublicationCentury | 2000 |
PublicationDate | 2025-08-12T00:00:00 |
PublicationDateYYYYMMDD | 2025-08-12 |
PublicationDate_xml | – month: 08 year: 2025 text: 2025-08-12T00:00:00 day: 12 |
PublicationDecade | 2020 |
PublicationPlace | Dallas |
PublicationPlace_xml | – name: Dallas |
PublicationTitle | International journal of big data intelligence and applications |
PublicationYear | 2025 |
Publisher | IGI Global |
Publisher_xml | – name: IGI Global |
References | IJBDIA.387389-28 IJBDIA.387389-29 IJBDIA.387389-26 IJBDIA.387389-27 IJBDIA.387389-24 (IJBDIA.387389-4) 2025 IJBDIA.387389-25 IJBDIA.387389-22 IJBDIA.387389-23 IJBDIA.387389-20 IJBDIA.387389-21 B.Babita Pandey (IJBDIA.387389-3) 2025 IJBDIA.387389-19 IJBDIA.387389-17 IJBDIA.387389-18 IJBDIA.387389-15 IJBDIA.387389-16 IJBDIA.387389-13 IJBDIA.387389-57 IJBDIA.387389-14 IJBDIA.387389-55 IJBDIA.387389-12 IJBDIA.387389-56 IJBDIA.387389-53 IJBDIA.387389-10 IJBDIA.387389-54 IJBDIA.387389-51 IJBDIA.387389-52 IJBDIA.387389-50 IJBDIA.387389-2 IJBDIA.387389-0 IJBDIA.387389-1 IJBDIA.387389-6 IJBDIA.387389-7 IJBDIA.387389-5 IJBDIA.387389-48 IJBDIA.387389-49 IJBDIA.387389-46 IJBDIA.387389-47 IJBDIA.387389-44 IJBDIA.387389-45 IJBDIA.387389-42 IJBDIA.387389-43 IJBDIA.387389-8 IJBDIA.387389-40 IJBDIA.387389-9 IJBDIA.387389-41 U. K.Devisetty (IJBDIA.387389-11) 2023 IJBDIA.387389-39 IJBDIA.387389-37 IJBDIA.387389-38 IJBDIA.387389-35 IJBDIA.387389-36 IJBDIA.387389-33 IJBDIA.387389-34 IJBDIA.387389-31 IJBDIA.387389-32 IJBDIA.387389-30 |
References_xml | – ident: IJBDIA.387389-16 doi: 10.1007/978-1-4939-3167-5_9 – ident: IJBDIA.387389-18 – ident: IJBDIA.387389-44 doi: 10.4018/978-1-6684-3662-2.ch002 – ident: IJBDIA.387389-24 – ident: IJBDIA.387389-38 doi: 10.4018/978-1-59140-557-3.ch140 – ident: IJBDIA.387389-47 – ident: IJBDIA.387389-7 doi: 10.4161/fly.19695 – ident: IJBDIA.387389-0 doi: 10.5772/intechopen.71349 – ident: IJBDIA.387389-5 – ident: IJBDIA.387389-1 – ident: IJBDIA.387389-21 doi: 10.1093/bioinformatics/btt656 – ident: IJBDIA.387389-6 doi: 10.1093/bioinformatics/btu170 – ident: IJBDIA.387389-10 – ident: IJBDIA.387389-51 doi: 10.1002/9781119792673 – ident: IJBDIA.387389-20 doi: 10.1093/bioinformatics/btp352 – ident: IJBDIA.387389-17 doi: 10.3390/ijms23094645 – ident: IJBDIA.387389-57 doi: 10.1201/9781351172646 – ident: IJBDIA.387389-28 – ident: IJBDIA.387389-52 doi: 10.1093/nar/gkg115 – ident: IJBDIA.387389-32 doi: 10.1093/bioinformatics/btq033 – ident: IJBDIA.387389-14 – ident: IJBDIA.387389-34 doi: 10.1007/978-981-16-9158-4 – ident: IJBDIA.387389-13 doi: 10.1093/nar/gky955 – ident: IJBDIA.387389-33 doi: 10.1093/bioinformatics/btp616 – ident: IJBDIA.387389-45 doi: 10.4018/IJCVIP.353913 – ident: IJBDIA.387389-49 doi: 10.1186/s12864-018-4665-2 – ident: IJBDIA.387389-27 doi: 10.1142/10053 – ident: IJBDIA.387389-22 doi: 10.1038/nrg3920 – year: 2023 ident: IJBDIA.387389-11 publication-title: Deep learning for genomics: Data-driven approaches for genomics applications in life sciences and biotechnology – ident: IJBDIA.387389-37 doi: 10.1038/s41467-020-16888-y – ident: IJBDIA.387389-29 – ident: IJBDIA.387389-23 doi: 10.1186/s13059-014-0550-8 – ident: IJBDIA.387389-15 doi: 10.1007/978-981-16-3607-3_1 – ident: IJBDIA.387389-19 doi: 10.4018/978-1-5225-8903-7.ch032 – ident: IJBDIA.387389-56 doi: 10.1007/978-3-319-41279-5 – year: 2025 ident: IJBDIA.387389-4 – ident: IJBDIA.387389-41 doi: 10.4018/978-1-5225-3142-5 – ident: IJBDIA.387389-26 – ident: IJBDIA.387389-40 doi: 10.54808/JSCI.22.07.6 – ident: IJBDIA.387389-53 doi: 10.1038/nbt.3820 – ident: IJBDIA.387389-48 doi: 10.1186/s12864-015-1911-8 – ident: IJBDIA.387389-54 doi: 10.1142/12425 – ident: IJBDIA.387389-9 doi: 10.1093/nar/gkac247 – ident: IJBDIA.387389-12 doi: 10.1093/bioinformatics/bts635 – ident: IJBDIA.387389-43 doi: 10.4018/978-1-7998-2768-9 – ident: IJBDIA.387389-55 doi: 10.1093/jambio/lxac055 – ident: IJBDIA.387389-42 doi: 10.4018/978-1-5225-3142-5.ch001 – ident: IJBDIA.387389-2 doi: 10.1007/978-981-99-6913-5 – ident: IJBDIA.387389-25 doi: 10.14806/ej.17.1.200 – ident: IJBDIA.387389-46 doi: 10.4018/IJARB.361940 – ident: IJBDIA.387389-50 – ident: IJBDIA.387389-31 doi: 10.1038/nmeth.4197 – year: 2025 ident: IJBDIA.387389-3 publication-title: Computational intelligence for genomics data – ident: IJBDIA.387389-35 doi: 10.1016/j.parkreldis.2025.107311 – ident: IJBDIA.387389-8 doi: 10.1007/978-3-319-28422-4 – ident: IJBDIA.387389-36 doi: 10.1101/gr.213611.116 – ident: IJBDIA.387389-30 – ident: IJBDIA.387389-39 doi: 10.4018/978-1-7998-2768-9.ch002 |
SSID | ssj0002213788 |
Score | 2.2993858 |
Snippet | This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable... |
SourceID | proquest crossref igi |
SourceType | Aggregation Database Index Database Publisher |
StartPage | 1 |
SubjectTerms | Big Data Corn Data integration Genetic modification Quality control |
Title | Big Data Integration in Genomic Analysis: Applications to Genetically Modified Corn (Zea Mays) |
URI | http://services.igi-global.com/resolvedoi/resolve.aspx?doi=10.4018/IJBDIA.387389 https://www.proquest.com/docview/3239806824 |
Volume | 6 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3LS8MwHA5uXrz4Fqdz5CCih7g-fk0zEWRT9xAmIgriwdAm6ShIO7d62H9v0ocOFG-FPg5ffu8034fQsYogsAM3IFJISkCBTwJ9SRS1VEeAb6lcY2l8T4fPcPfivZQDt3n5W2UVE_NALVNhZuRt1xDVWZQ5cDX9IEY1yuyulhIaNbRq60xj7Jz1B98zFsexDV260ZfTaZ_YujguaDZ1U8Hao7vezah77jLfNSLvS2mpFk_iX7E5Tzj9TbReVoq4WyztFlpRyTbaqFQYcOmUO-itF0_wTZAFeFRSP2iocZzggcqPHOOKd-QCd5d2q3GWmieKM4zvCzxOZRzpehRfp7MEn76qAI-DxfxsFz33b5-uh6RUTSBCYwDEAypAKU8w34-cEASjLARJpUcj14VOyLxQ-6muAx3GhAApHEtKENTIWJlNvT1UT9JE7SOsXKHbPxFBx7VAhrq47EDIGPVsJpiODg10UoHGpwU5BtdNhUGXF-jyAt0GutSQ8tI95tyAwpdA4XHCc1D-fJs2ULNai59v_BjEwf-3D9GaY0R7DY-t00T1bPapjnQlkYWt3FxaaLV3e__w-AW_Q8YJ |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB212wNcgBZQF_rhQ0FwME0cx_FWVNVut2XTdlcItVLFATexnSoSSkp3Edo_xW9knI92JRC33iIl8eHNZOaNnZkHsGMznvhJkFCjjaDc8ogmeEmt8GxP88izlcbSeCJGF_zkMrxcgt9tL4z7rbKNiVWgNqV2e-S7gRtU5wnJ-MHND-pUo9zpaiuhUbvFqZ3_wpJtuh8P0b5vGDs-Oj8c0UZVgGofsz8NudDc2lDLKMpYyrUUMuVGmFBkQcB7qQxT9GPkSUxKrbnRzDOGa-FkntyhF667DCvcdbR2YGVwNPn85W5XhzHfDWh3inZINKiPdLwe7IlljNyNTwbDuP8hkFHgZOUXEuFyfp3_lQ2qFHf8DJ403JT0a2dahSVbrMHTVveBNGHgOXwb5NdkmMwSEjfDJtC4JC_IJ1s1OZN20ske6S-cj5NZ6Z6ouya_z8m4NHmGDJgclrcFeffVJmSczKfvX8DFgyD6EjpFWdh1IDbQWHDqjPcCj5sU6WyPp1KK0JdaYjzqwtsWNHVTj-NQWMY4dFWNrqrR7cJHhFQ1H-RUOVDUAigqL1QFyj_fFl3YaG1xv8a9C776_-1teDQ6H5-ps3hy-hoeMycZ7Kbosg3ozG5_2k3kMbN0q3EeAlcP7a9_AI2VAM0 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Big+Data+Integration+in+Genomic+Analysis%3A+Applications+to+Genetically+Modified+Corn+%28Zea+Mays%29&rft.jtitle=International+journal+of+big+data+intelligence+and+applications&rft.au=Segall%2C+Richard&rft.au=Rajbhandari%2C+Prasanna&rft.date=2025-08-12&rft.pub=IGI+Global&rft.issn=2644-1675&rft.eissn=2644-1683&rft.volume=6&rft.issue=1&rft.spage=1&rft.epage=30&rft_id=info:doi/10.4018%2FIJBDIA.387389 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2644-1675&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2644-1675&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2644-1675&client=summon |