Genome-wide exploratory analysis for NARAC dataset with preparation for haplotype block partitioning through minor allele frequency quality control viewpoint

This article provides a detailed description, analysis, and visualization of a case–control genome-wide genotypic dataset from the North American Rheumatoid Arthritis Consortium (NARAC). The data is presented in terms of the number of females and males in both cases and controls, as well as the perc...

Full description

Saved in:
Bibliographic Details
Published inIran Journal of Computer Science (Online) Vol. 6; no. 4; pp. 387 - 396
Main Authors Saad, Mohamed N., Zareef, Galena W., Ibrahim, Fatma S., Said, Ashraf M., Hamed, Hisham F. A.
Format Journal Article
LanguageEnglish
Published Cham Springer International Publishing 01.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This article provides a detailed description, analysis, and visualization of a case–control genome-wide genotypic dataset from the North American Rheumatoid Arthritis Consortium (NARAC). The data is presented in terms of the number of females and males in both cases and controls, as well as the percentage of missing data. The number of alleles and genotypes is also counted, and the minor allele frequency (MAF) is calculated for each single nucleotide polymorphism (SNP). The data is further classified into four categories based on the SNP's MAF, namely, very rare, rare, low frequency, and common SNPs. The regions of these categories in the chromosome are investigated to determine the proportion of SNPs in coding locations and other regions. It is observed that each category has a different proportion in each region of consequence annotation. The data composition in terms of alleles and genotypes is found to be greatly disproportionate. The results present clear insights into the data and its MAF, which can be compared with other datasets. These findings can aid researchers in gaining a comprehensive understanding of such case–control datasets and bring accurate insights into the data.
ISSN:2520-8438
2520-8446
DOI:10.1007/s42044-023-00147-8