Question

Strand-specific quantification of RNA-seq data without strand-aware mapping

0

Entering edit mode

5 months ago

tdsone • 0

Hi!

I'm trying to produce strand specific bigwig coverage files for a yeast RNA-seq experiment with unknown library type.

After mapping the reads against the S. cerevisiae reference genome STAR returns two coverage files:

in blue: coverage (.bigiwig) files/pileups of the aligned reads

Here, you can see that the gene is on the sense strand but half of the reads are mapped to the antisense. (Blue tracks are read pileups/bigwig coverage files). I assume that this is because some reads are PCR amplicons and not the original RNA and thus map to the wrong strand.

My goal is to produce "correct" coverage files, where the orientation of the annotated gene is taken into account when the coverage profiles are produced. How could I do this?

In this case for example, I would expect that the correct coverage profile show roughly the sum of both profiles but on the upper track/sense strand and 0 on the antisense strand.

Any help is much appreciated!

Best Timon

rna-seq yeast • 648 views

ADD COMMENT • link updated 5 months ago by Carlo Yague 8.9k • written 5 months ago by tdsone • 0

0

Entering edit mode

Two questions: Are your RNA-seq data stranded, and how did you produce the tracks?

ADD REPLY • link 5 months ago by ATpoint 85k

0

Entering edit mode

Are your RNA-seq data stranded?

I don't know but ran salmon to autoinfer the strandedness which says "U" so I assume it's unstranded.

How did you produce the tracks?

Ran Trim Galore! to trim reads.

Used STAR to map the reads against the R64-1-1 reference with option --outWigType wiggle and --outWigStrand Stranded

ADD REPLY • link updated 5 months ago by GenoMax 147k • written 5 months ago by tdsone • 0

1

Entering edit mode

If it's unstranded then it's exactly that. I don't see how any tool could change that other than assuming that if a gene was on reverse strand then all intersecting signal would need to be on that strand too. But that's a strong assumption and a problem when genes overlap. It's custom.

ADD REPLY • link 5 months ago by ATpoint 85k

0

Entering edit mode

If your data is unstranded then you should have used --outWigStrand Unstranded.

ADD REPLY • link 5 months ago by GenoMax 147k

score 0 · Answer 1 · 2024-06-20

From the read density profile, I think that the data is stranded, but was sequenced in paired-end. This is because there is a 5' bias in the top track and a 3' bias in the bottom track. These can only be explained by stranded library fragments sequenced from both ends (typically in illumina paired-end sequencing) where the first read of a pair is always in the opposite orientation than the second read.

I believe that STAR --outWigStrand Stranded does not take into account that paired reads have opposite strand orientation. If it is the case, you should not use it. Instead I would suggest to start from the bam file outputed by STAR and use deeptools bamcoverage with the option filterRNAstrand forward or reverse.