Entering edit mode
2.8 years ago
nicolo.gualandi
•
0
Hi,
I have called SNPs and INDELs starting from stranded RNA-seq data. I have seen many papers and tools stating that is possible to detect the strand on which the variation (SNPs or INDELs) occur. My question is: is this possible also by calling variants using RNA-seq data and not DNA-seq data ?
Thank you for the help
Strand, you mean top- or bottom stand? That would make no sense as DNA is double-stranded (and RNA is based on DNA in the transcriptional sense) so any variant always occurs on both strands. Can you link a reference?
Thank you for your reply.
For example in the MutationalPatter software ( here you can find the manual https://bioconductor.org/packages/release/bioc/vignettes/MutationalPatterns/inst/doc/Introduction_to_MutationalPatterns.html) at the section "Strand bias analyses". They report that "For the mutations within genes it can be determined whether the mutation is on the transcribed or non-transcribed strand". How this is possible? Do you think that this analysis could be also perfomed using variants called from stranded RNA-seq data?
Thanks
Ah, that you mean. The concept behind that is that if you only can call variants from reads that align to the top- or bottom strand then this is a strand-bias and indicates a false call. No, probably cannot get that from RNA-seq I think, at least not from stranded preps. That is one of the reasons why RNA-seq is not really meant for variant calling.
So it is only a matter of bias? Have nothing to do with biology?
No biology. It is a QC metric, or rather a "heuristic" to judge variant quality/reliability.
If one has paired-end RNA-Seq they'll have reads aligning on both strands.
Though indeed most RNA-Seq studies employ single-end, stranded protocols. Many variant callers can compute some sort of strand bias metric for example
bcftools
can produce theSP
tag (Strand Bias p-value) that you can filter your data with:strand bias filtering with bcftools
Here is an early paper on the topic:
The effect of strand bias in Illumina short-read sequencing data