Definition of alleles and altered regulatory motifs across Cas9-edited cell populations

Background: Genetic alteration of candidate response elements at their native chromosomal loci is the only valid determinant of their potential transcriptional regulatory activities. Targeted DNA cleavage by Cas9 coupled with cellular repair processes can produce arrays of alleles that can be define...

Full description

Saved in:
Bibliographic Details
Published inbioRxiv
Main Authors Ehmsen, Kirk T, Knuesel, Matthew T, Martinez, Delsy, Asahina, Masako, Aridomi, Haruna, Yamamoto, Keith R
Format Paper
LanguageEnglish
Published Cold Spring Harbor Cold Spring Harbor Laboratory Press 19.09.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Background: Genetic alteration of candidate response elements at their native chromosomal loci is the only valid determinant of their potential transcriptional regulatory activities. Targeted DNA cleavage by Cas9 coupled with cellular repair processes can produce arrays of alleles that can be defined by massively parallel sequencing by synthesis (SBS), presenting an opportunity to generate and survey edited cell populations that include informative alterations. Such editing efforts commonly rely on subclonal enrichment to isolate cells with preferred genotypic properties at target loci; short nucleotide adducts (indices/barcodes) allow PCR-amplified molecules from diverse sample sources to be pooled, sequenced, and demultiplexed to resolve source-specific content. Not widely available, however, are capabilities for barcoding thousands of clones, or for automated analysis of individual candidate regulatory loci PCR-amplified and sequenced from a genetically heterogeneous population, specifically, imputation of discrete genotype(s) by allele definition and abundance, and identification of altered regulatory factor binding motifs. Results: We describe a panel of 192 8-nucleotide barcode primers compatible with Illumina sequencing platforms, and the application of these barcodes to genotypic analysis of Cas9-edited clones. Permutations of the ninety-six i7 (read 1) and ninety-six i5 (read 2) barcodes allow unique labeling of up to 9,216 distinct samples. We created three independent Python scripts: SampleSheet.py automates construction of Illumina Sample Sheets encoding up to 9,216 barcode:sample relationships; ImputedGenotypes.py defines alleles and imputes genotypes from demultiplexed fastq files; CollatedMotifs.py flags transcription factor recognition motif matches altered in alleles relative to a reference sequence. Conclusions: Code-enabled definition of alleles and regulatory motifs in sequenced, demultiplexed amplicons facilitates evaluation of genetic diversity in up to 9,216 distinct samples. Here, we demonstrate the utility of three scripts in analysis of cell populations targeted by Cas9 for disruption of glucocorticoid receptor (GR) binding sites near FKBP5, a GR-regulated gene in the human adenocarcinoma cell line A549. SampleSheet.py, ImputedGenotypes.py, and CollatedMotifs.py operate independently and are broadly applicable beyond the case described here. Footnotes * https://github.com/YamamotoLabUCSF * https://zenodo.org/deposit/3406862
DOI:10.1101/775361