Question

Is it possible to infer the strand of a variation called using RNA-seq reads?

0

Entering edit mode

2.7 years ago

nicolo.gualandi • 0

Hi,

I have called SNPs and INDELs starting from stranded RNA-seq data. I have seen many papers and tools stating that is possible to detect the strand on which the variation (SNPs or INDELs) occur. My question is: is this possible also by calling variants using RNA-seq data and not DNA-seq data ?

Thank you for the help

RNAseq variant strand GATK calling • 1.2k views

ADD COMMENT • link updated 2.7 years ago by Istvan Albert 101k • written 2.7 years ago by nicolo.gualandi • 0

0

Entering edit mode

Strand, you mean top- or bottom stand? That would make no sense as DNA is double-stranded (and RNA is based on DNA in the transcriptional sense) so any variant always occurs on both strands. Can you link a reference?

ADD REPLY • link 2.7 years ago by ATpoint 85k

0

Entering edit mode

Thank you for your reply.

For example in the MutationalPatter software ( here you can find the manual https://bioconductor.org/packages/release/bioc/vignettes/MutationalPatterns/inst/doc/Introduction_to_MutationalPatterns.html) at the section "Strand bias analyses". They report that "For the mutations within genes it can be determined whether the mutation is on the transcribed or non-transcribed strand". How this is possible? Do you think that this analysis could be also perfomed using variants called from stranded RNA-seq data?

Thanks

ADD REPLY • link 2.7 years ago by nicolo.gualandi • 0

0

Entering edit mode

Ah, that you mean. The concept behind that is that if you only can call variants from reads that align to the top- or bottom strand then this is a strand-bias and indicates a false call. No, probably cannot get that from RNA-seq I think, at least not from stranded preps. That is one of the reasons why RNA-seq is not really meant for variant calling.

ADD REPLY • link 2.7 years ago by ATpoint 85k

0

Entering edit mode

So it is only a matter of bias? Have nothing to do with biology?

ADD REPLY • link 2.7 years ago by nicolo.gualandi • 0

0

Entering edit mode

No biology. It is a QC metric, or rather a "heuristic" to judge variant quality/reliability.

ADD REPLY • link 2.7 years ago by ATpoint 85k

0

Entering edit mode

If one has paired-end RNA-Seq they'll have reads aligning on both strands.

Though indeed most RNA-Seq studies employ single-end, stranded protocols. Many variant callers can compute some sort of strand bias metric for example bcftools can produce the SP tag (Strand Bias p-value) that you can filter your data with:

strand bias filtering with bcftools

Here is an early paper on the topic:

The effect of strand bias in Illumina short-read sequencing data

https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-666

ADD REPLY • link 2.7 years ago by Istvan Albert 101k