Big Data Integration in Genomic Analysis: Applications to Genetically Modified Corn (Zea Mays)

This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for qu...

Full description

Saved in:

Bibliographic Details
Published in	International journal of big data intelligence and applications Vol. 6; no. 1; pp. 1 - 30
Main Authors	Segall, Richard S, Rajbhandari, Prasanna
Format	Journal Article
Language	English
Published	Dallas IGI Global 12.08.2025
Subjects	Big Data Corn Data integration Genetic modification Quality control
Online Access	Get full text
ISSN	2644-1675 2644-1683
DOI	10.4018/IJBDIA.387389

Cover

More Information
Summary:	This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for quality control and visualization. Key metrics are visualized with Matplotlib and Seaborn software, providing insights into sample quality, contamination risks, and optimal trimming strategies. Portions of datasets containing over a million genomic values for genetically modified corn (Zea Mays) are used to illustrate practical challenges such as read variability and quality degradation, supporting the use of quality-based filtering. Related work on big data, genomics, and genomic tools and pipelines is also reviewed.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2644-1675 2644-1683
DOI:	10.4018/IJBDIA.387389