Transcript coverage 3' bias. Is it really that determining?
2
0
Entering edit mode
11 days ago
Amb@r85 ▴ 10

Hello everybody!

I am new in this bioinformatics world and I have a question. My samples from the beginning presented good qualities and RIN above 7 (the lowest RIN I had was of 5.8) and my best samples presented above 8. I applied all the quality controls and did the trimmings and mappings with trimmomatic and STAR. My Phred scores were above 36 and in general I had no major issues... until someone suggested to make a contrast with another software (CLC GW) that presented a transcript coverage bias on the 3' but only for the lengths above 5000. The rest of the lengths are ok. Now it is suggested that my samples are not good quality but I am trying to observe if the alignments with STAR present the same behavior. Do you really think this is as determining to eliminate even my samples with a RIN above 8.5? If so, how can I even trust the parameters that suggest my samples are ok if in the downstream this detail of the coverage is suggesting the opposite? Any thoughts?

mappings transcriptome RNA • 608 views
ADD COMMENT
0
Entering edit mode

If I understand correctly, you're proposing that more 3' coverage means bad sample quality?

I have no reason to suspect that to be true. It could just be because you're using oligo-dT primers (which prime from the polyA tail) right?

ADD REPLY
0
Entering edit mode

Yes, that's exactly it. However, the technician and the other staff is suggesting this samples will present a bias in DEG analysis. I don't think this is accurate but I don't have any backup on my thoughts and I haven't been able to find any article on this issue. Technicians are suggesting that my RNA is fragmented by my method but I think this cannot be the issue, since if it was, all the samples would present the same behavior, which is not happening.

ADD REPLY
1
Entering edit mode
11 days ago

3' bias is very common in poly-A selected samples. Even if the RIN is high going in to library prep, fragmentation of the RNA during Poly-A enrichment can still lead to a 3' bias. Further, you say that this bias is really on transcripts more than 5kb in length. If this is mature length, than that is on the longer side for coding RNAs. It could be that this is being caused by the inclusion of mis-annotated transcripts that include spurious 5' UTRs, or even fusion transcripts annotated by detection read through transcripts from two adjacent genes.

What is important in terms of DEG analysis is that the bias is consistent across samples. DEG analysis relies on their being the same qualitative relationship between read count and gene expression in each sample. If there is 3' bias, this should affect that as long as the 3' bias is present in all samples.

In conclusion, you are right that binning an experiment because the samples who 3' bias is too stringent a quality control measure.

ADD COMMENT
0
Entering edit mode

Thank you so much! I was starting to feel I was going crazy for something is not really mentioned or given too much importance on the papers. I will check that consistency and also in terms of the experiment the spurious 5' UTRs you mentioned, as I haven't seen it in the literature I found.

ADD REPLY
0
Entering edit mode

In terms of the 5' UTRs, I'd check some of the transcripts over 5Kb. What you should see is a consistant trailing off of coverage across most transcripts, rather than good covered up until a point and then a drop to zero in some transcripts and a continuation of coverage in other transcripts.

ADD REPLY
0
Entering edit mode
11 days ago
michael.ante ★ 3.9k

Hi Amb@r85,

What depletion/library-prep method/kit/sequencer was used to generate the reads? Depending on that the transcripts might have a 3' bias.

What is your goal with this experiment? For differential expression it might be OK, if the bias is the same for all samples.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

Hi Michael,

It was done with Illumina but I am not sure if rRNA depletion was used (I suspect it was not). I am also suspecting this is due to the poly A. And yes, my goal is DEG but the strongest argument everybody is listening is that my samples will present a bias on the DEG. I don't know how to refute this argument as I cannot find any information or articles with this issue. Do you have any resources on that?

ADD REPLY
1
Entering edit mode

There is (proprietary) software out there for estimating transcript abundances with the help of coverage bias (see here).

On the other hand, you have library preps like from 10x genomics or QuantSeq which have a quite strong 3'bias and still are used for DGE.

ADD REPLY
0
Entering edit mode

And the bias is not even on all the lengths, is only on lengths above 5000. I was frustrated because I think the quality check is being too stringent on this matter. Thank you for the link!

ADD REPLY
1
Entering edit mode

If you use DEG-workflows like DESeq2 or edgeR, it doesn't matter, since only read-counts per gene are used here. Make sure that the 3' biases of the 5k-transcripts are equally between the samples. E.g. make coverage plots with RSeQC for a set of transcripts > 5k bp.

ADD REPLY
1
Entering edit mode

Thank you so much! I think the bias is similar so in a way is not about the extraction but rather the library preparations as you mentioned. I did the RSeQC and the curve is slightly skewed to the 3' but not as I have seen in other data. I tested with qualimap as well and again I don't have a significant bias, in my opinion. However, final decision is not up to me. :( However, thanks! I feel like I am not so lost, even if I am new.

ADD REPLY
0
Entering edit mode

Well, the burden of proof doesn't fall on you lol; they're the ones making the claim.

ADD REPLY

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6