Dear all, I am learning RNA Sequencing. So i want to know some basic stuff. In RNA Seq data some sequencing error also possible but it has SNPs also how to do differentiate or any other way to figure out..
Dear all, I am learning RNA Sequencing. So i want to know some basic stuff. In RNA Seq data some sequencing error also possible but it has SNPs also how to do differentiate or any other way to figure out..
It depends on what you are looking at. For instance distinguishing SNVs from sequencing error in RNA-Seq data is very different if I am looking at germline data versus if I am looking at a tumour with somatic mutations present. As @WouterDeCoster stated in his comment, some motifs (like homopolymer tracts for instance) tend to be more error prone so they can have relatively high frequencies of the "variant" in your reads, sufficient to pass thresholds, and still actually be a sequencing artifact. You usually only identify these by flagging problematic regions or by looking at them by eye. Otherwise, your first pass for distinguishing them comes down essentially to a combination of depth of coverage (specifically the fraction of all reads that support the variant), base quality scores, and the variant quality score (which takes both prior factors into account when it is calculated). If you are not sequencing a tumour, then variants will fall into ranges (and have calculated genotype probabilities) consistent with heterozygous or homozygous changes. However, as stated, there are problematic regions where you can get false positive calls regardless. If you are sequencing tumours this is harder, as you are often looking for low-frequency variants so you may only have a small number of reads. You will often see recurrent false positive sequencing artifacts so doing lots of sequencing can help screen out artifacts.
Variant callers in both cases do weed out a lot of false positives based on the base qualities, which for artifacts do tend to be lower.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Recurrence. If multiple reads share the same specific mismatch, it is very unlikely that it is sequencing error.
That's not entirely correct. There are specific motifs known which are more error prone, which could lead to systematic sequencing errors.