I targetted a protein-coding exon with Cas9, amplified a ~ 150bp product in the region and sequenced with Illumina MiSeq. I now have > 1000X coverage of the region, mapped to my reference genome. Many of these reads have indels, thus potentially inducing a frameshift (if not a multiple of 3).
Is there any tool available to predict premature stop codons in my reads (or even non-sense mediated decay)?
I should add I do not want to create an assembly/consensus of my reads. That is because there is mosaicism in the mutations induced by Cas9 (eg. half of my reads have a 1bp deletion, the other half have a 9bp insertion). In the simplest form, I would want a tool that scans for stop codons each of my mutated read, and in the correct frame (i.e. the frame used by the cell to translate that specific mRNA).
call variants (vcf) and annotate with VEP. stop_gained and NMD_transcript_variant might be of interest to you. other variations such as transcript ablation, frameshift mutations might be of interest to you. https://asia.ensembl.org/info/genome/variation/prediction/predicted_data.html
Thanks. It's very helpful; but I think the variant caller will not do the job I need. I would essentially need a variant call on each individual read, if that makes sense?
Just had a shot with a read now. I manually wrote the variant (it is just a 10bp deletion in the middle of the read). It does a pretty good job for part of it: looking up which transcript the variant will affect and calling a frameshift. I wish it would go transcribe/translate downstream in the new frame to see if any premature stop codon pops up though. Any clue why it does not do that? Am I missing something?
You generally expect that any out of frame frameshift is going to run into a stopcodon soon, but I'm not aware of any annotating software which formally assesses that.
Right, I see. It's a shame, it'd be nice data to have!
Cross post at https://bioinformatics.stackexchange.com/questions/7340/how-to-predict-stop-codons-in-illumina-reads
Yes, I wrote that. I did not get satisfying answers though. Is that an issue you think? Should I delete the stack post?
Not necessarily, but it's best to be open about that.