Big Data Integration in Genomic Analysis: Applications to Genetically Modified Corn (Zea Mays)

This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for qu...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of big data intelligence and applications Vol. 6; no. 1; pp. 1 - 30
Main Authors Segall, Richard S, Rajbhandari, Prasanna
Format Journal Article
LanguageEnglish
Published Dallas IGI Global 12.08.2025
Subjects
Online AccessGet full text
ISSN2644-1675
2644-1683
DOI10.4018/IJBDIA.387389

Cover

More Information
Summary:This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for quality control and visualization. Key metrics are visualized with Matplotlib and Seaborn software, providing insights into sample quality, contamination risks, and optimal trimming strategies. Portions of datasets containing over a million genomic values for genetically modified corn (Zea Mays) are used to illustrate practical challenges such as read variability and quality degradation, supporting the use of quality-based filtering. Related work on big data, genomics, and genomic tools and pipelines is also reviewed.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2644-1675
2644-1683
DOI:10.4018/IJBDIA.387389