Big Data Integration in Genomic Analysis: Applications to Genetically Modified Corn (Zea Mays)
This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for qu...
Saved in:
Published in | International journal of big data intelligence and applications Vol. 6; no. 1; pp. 1 - 30 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Dallas
IGI Global
12.08.2025
|
Subjects | |
Online Access | Get full text |
ISSN | 2644-1675 2644-1683 |
DOI | 10.4018/IJBDIA.387389 |
Cover
Summary: | This paper explores the integration of big data techniques into the genomic analysis pipeline, with an emphasis on methodological rigor and scalable computational frameworks. It presents a comprehensive, step-by-step approach to preprocessing raw FASTQ files using Biopython, NumPy, and Pandas for quality control and visualization. Key metrics are visualized with Matplotlib and Seaborn software, providing insights into sample quality, contamination risks, and optimal trimming strategies. Portions of datasets containing over a million genomic values for genetically modified corn (Zea Mays) are used to illustrate practical challenges such as read variability and quality degradation, supporting the use of quality-based filtering. Related work on big data, genomics, and genomic tools and pipelines is also reviewed. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2644-1675 2644-1683 |
DOI: | 10.4018/IJBDIA.387389 |