Hello,
Suppose you have a file with, say, 2M reads from Illumina sequencing.
You also have a region of interest, that you want to know if it was modified by Cas in a Crisp-Cas experiment. If Cas is inactive, then the region should stay intact, if not, it will have modifications, which include either nucleotide changes or small deletions.
My question is how I can estimate whether a variation that I might notice is actually because of Cas or a sequencing error.
My approach is, after mapping the reads to my reference construct, take this region of interest and explore the variation across all the reads. I read that Illumina sequencing error is, traditionally, set to ~0.1%. Does that mean that, for any given base in my 20-nt long region of interest, I can expect 0.1% of the reads to have a mismatch as compared to the expected nucleotide (i.e. the one I have from my reference construct)?
Is this a correct assumption to make? If not, how can I evaluate whether a variation I am observing in a given position out of these 20nts comes from sequencing error or that actually is a result of Cas modification?
Thank you!