3' Bias in RNASEQ data
1
0
Entering edit mode
8.6 years ago
mjg ▴ 30

Hello,

I have rna seq samples which show 3' bias in the gene body coverage of 10,000 random genes.

I first looked at the RIN values to see if degradation was linked to this, but all my samples are over 8 RIN, so I do not think it is the main reason.

I also read that different protocols might show different gene body coverage profiles. In this case we did mRNA seq (as opposed to total RNA seq).

However, a biological subgroup of the samples (plus a sample from another group) shows a significantly more pronounced bias.

Has anyone observed this before? Is there any way to correct for this bias in the counts? I am not sure whether this could cause false positives, as all samples of one biological group have the most pronounced bias.

Thanks,

Maria

rna-seq bias qc • 12k views
ADD COMMENT
0
Entering edit mode

Were your samples sequenced in different batches? Is the effect specific to one batch? Do you work with a commercial sequencing provider?

ADD REPLY
0
Entering edit mode

Do you work with a commercial sequencing provider?

I'm also curious about this. I would hope the provider is capturing this bias during QC and conveying it to the client.

ADD REPLY
0
Entering edit mode

3' end bias is pretty common in mRNA-seq protocols...see http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4310221/figure/F6/

ADD REPLY
1
Entering edit mode

Yes it's not surprisingly, but I was more worried by "However, a biological subgroup of the samples (plus a sample from another group) shows a significantly more pronounced bias." This will jeopardize completely your downstream analysis because you will never know if the effect you see is caused by the batch effect or the biological effect.

ADD REPLY
0
Entering edit mode

Thanks all for your comments, and yes this is the main thing that I am worried about. All the libraries were prepared and sequenced in one batch.

I think I will get the genes that are differentialy expressed, compute a per gene coverage distribution for each sample, and see how many of those genes have different gene body coverage between groups.

Ive also seen some people have done counting just in the 3' end of the genes, might try that too.

ADD REPLY
0
Entering edit mode

This will jeopardize completely your downstream analysis because you will never know if the effect you see is caused by the batch effect or the biological effect.

Just curious, but what sort of biological effect do you think could causing this kind of bias throughout a sample's entire transcript pool?

ADD REPLY
1
Entering edit mode

I don't think this bias is caused by a biological effect but my guess a different treatment (culture condition/tissue collection) of the cells.

ADD REPLY
0
Entering edit mode

thanks ill have a look at this paper

ADD REPLY
0
Entering edit mode

How do we deal with this during analysis? Should we ignore and proceed?

ADD REPLY
0
Entering edit mode

It's not surprising that you are seeing a 3' bias in read mapping since mRNA sequencing typically involves poly-A capture. Transcripts with degraded 3' ends (which may show a 5' bias in read mapping) will not be captured.

It's not uncommon to transcript read mapping bias even for samples that appear to be good quality. This seems to usually be an issue with prep, but a lot of factors can contribute to it. How severe is the bias you're observing?

ADD REPLY
0
Entering edit mode

Thanks for your comment spvensko. The bias of the worse samples have around ~.5 coverage of the highest coverage value, at the middle of the gene body, and it goes down towards the 5'. Then the other samples have ~.75 of the highest coverage value, at the middle of the gene body, going down as it approaches the 5'.

ADD REPLY
1
Entering edit mode
8.6 years ago
Chirag Nepal ★ 2.4k

Generally, RNA library are prepared using random primer, but sometime oligo-dt are used as primer. The use of oligo-dt primer in RNA-seq library enriches the reads towards 3'ends of genes. So, check how your library was prepared ? I cannot remember exactly, but there is a systematic comparison of random primed and oligo-dt primed study, in think in genome biology (though not sure).

ADD COMMENT
0
Entering edit mode

thanks chirag, we did mRNA and yes seems like this protocol gives in general this profile. My problem is mainly that a biological group show a more pronounced bias, so I wonder if downstream analysis will be affected by this.

ADD REPLY
0
Entering edit mode

can you elaborate on why the RNA-seq library will enrich the 3' end by using the polyA? (what are the reasons?)

ADD REPLY

Login before adding your answer.

Traffic: 2074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6