Hello,
I have some protein X with a known protein sequence which has both its cDNA and genomic sequence(incl. introns and UTRs) on genbank.
I have used random mutagenesis to create random mutations in the genomic sequences of protein X. I wish to use CRISPR Cas9 to target these mutations. However, Cas9 can only target codons which are up to 21 nt 5 prime from a GG (PAM) motif or 21 nt 3 prime from a CC motif in the genome. If I want to maximise specificity, then the range is only 11 nt 5 or 3 prime respectively. Hence I am aware that some residues`codons may be too far away from the PAM site to mutate using Cas9.
The problem is I dont have any idea how to determine which residues whose codons are too far away (greater than 21 or 11 nt) from the PAM motif! If the DNA sequence contains only cDNA, then I have some idea, but to complicate things the genomic sequences contains UTRs and introns, and they may contain PAM sites as well!
How should I go about trying to solve my problem?
Thanks for the Help!
You don't mention the organism, so my question is 'is the genome sequence not available for your organism?'
Its available . Its a human (H sapiens) Im also trying Mouse (mus musculus)
So, shouldnt you use a genome browser to help answer your question?
Yes I did but searching manually for PAMs would be too time consuming. Is there any way I can use software to automate the problem (e.g find, blast etc)?
If I can get the locations of the CDS from the gDNA sequence then I can write a script to automate that for me. The problem is I dont know how to fetch the CDS regions from the gDNA from genbank or from the genome browser.