Is there a way to computationally validate called variants from transcriptomic/RNA-seq data? I understand that GATK recommends hard filtering since VQSR cannot be done for RNA but are there any other alternatives?
Is there a way to computationally validate called variants from transcriptomic/RNA-seq data? I understand that GATK recommends hard filtering since VQSR cannot be done for RNA but are there any other alternatives?
I think I would shift focus and see this rather as a genotyping than de novo variant calling approach. Any novel variant has a higher probability of being bogus than real (you can look into accuracy and false positive rates for different variant callers, however, this sentence is true even if the FP-rate is < 1%, unlikely for transcriptome-based variant calling). Estimate, how many novel mutations you would expect to see, I'd say that any number exceeding 10-100 indicates too lax filtering. If you want a better estimate, try to calculate the expected number of mutations based on the mutation rate (something in the order of 1E-8 to 1E-10) and the number of generations since the divergence of your strain (that is a bit vague). If you now compare the estimates even roughly, you will see that the probability of a novel mutation arising in any given position is possibly several orders of magnitude lower than the FP rate. That brings me to the conclusion that without any further information, the wisest strategy (based on likelihood ratios at least) may be to ignore any novel variant.
First, annotate your variants with the GATK workflow https://gatk.broadinstitute.org/hc/en-us/articles/13832654601755-VariantAnnotator. Separate your set into annotated and novel variants and go from there, depending on what you are interested in.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.