HIGH BCV in EdgeR, any ideas?
1
0
Entering edit mode
8.2 years ago
Biogeek ▴ 470

I've got this difficult RNA-seq de novo dataset, and despite removing contamination from my samples and performing the experiment to the best of my ability I'm still getting a BCV of 0.6 for my experiment. I've been told that this result is 'bad' and that I can't publish with a high BCV. Can someone comment on this? This species has a genome which is 3/4 complete, if i align my reads to that, my BCV is 0.2 with no prior filtering.

It seems in the de novo quite a proportion of my genes show variability across replicates and the samples are quite heterogeneous. We even conducted physiological measurements pre-experiment to ensure they were all at a suitable level of acclimation. All other parameters were tightly controlled to make the experiment stringent and fair.

RNA was extracted using uniform method, and at the same time to prevent batch effects. I have applied TMM in EdgeR as some library sizes were double of others , and used a cut-off of at least 1CPM in at least 3 samples for a gene to be taken forward for analysis.

I tried looking at the variable genes with low prior.df's; however they seem to be random genes and there's no obvious patterns emerging.

Any ideas on why the de novo has such a high BCV but the genome aligned version has a nice low value? The de novo is made of several assemblies merged including the genome model genes and clustered into non redundant transcripts.

Thanks.

EdgeR BCV GLM high dispersion • 2.7k views
ADD COMMENT
1
Entering edit mode

Are you quantifying genes in one method and transcripts in the other? I imagine that quantifying transcripts with non-optimal methods will lead to higher BCVs.

ADD REPLY
0
Entering edit mode

Predicted gene models in genome guided. Evidential genes assembled de novo in the second. Any tips?

ADD REPLY
0
Entering edit mode

My suspicion is that this is some quirk of how the alignments and counts are working with your assembly. I wouldn't know what's going funky there, but that's where you should be looking.

ADD REPLY
0
Entering edit mode
8.2 years ago
Biogeek ▴ 470

Any knowledge out there? :-) What could be causing the huge difference between genome gene model based counts and the de novo?

ADD COMMENT

Login before adding your answer.

Traffic: 2524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6