Question

HIGH BCV in EdgeR, any ideas?

0

Entering edit mode

8.7 years ago

Biogeek ▴ 480

I've got this difficult RNA-seq de novo dataset, and despite removing contamination from my samples and performing the experiment to the best of my ability I'm still getting a BCV of 0.6 for my experiment. I've been told that this result is 'bad' and that I can't publish with a high BCV. Can someone comment on this? This species has a genome which is 3/4 complete, if i align my reads to that, my BCV is 0.2 with no prior filtering.

It seems in the de novo quite a proportion of my genes show variability across replicates and the samples are quite heterogeneous. We even conducted physiological measurements pre-experiment to ensure they were all at a suitable level of acclimation. All other parameters were tightly controlled to make the experiment stringent and fair.

RNA was extracted using uniform method, and at the same time to prevent batch effects. I have applied TMM in EdgeR as some library sizes were double of others , and used a cut-off of at least 1CPM in at least 3 samples for a gene to be taken forward for analysis.

I tried looking at the variable genes with low prior.df's; however they seem to be random genes and there's no obvious patterns emerging.

Any ideas on why the de novo has such a high BCV but the genome aligned version has a nice low value? The de novo is made of several assemblies merged including the genome model genes and clustered into non redundant transcripts.

Thanks.

EdgeR BCV GLM high dispersion • 2.9k views

ADD COMMENT • link 8.7 years ago by Biogeek ▴ 480

1

Entering edit mode

Are you quantifying genes in one method and transcripts in the other? I imagine that quantifying transcripts with non-optimal methods will lead to higher BCVs.

ADD REPLY • link 8.7 years ago by Devon Ryan 105k

0

Entering edit mode

Predicted gene models in genome guided. Evidential genes assembled de novo in the second. Any tips?

ADD REPLY • link 8.7 years ago by Biogeek ▴ 480

0

Entering edit mode

My suspicion is that this is some quirk of how the alignments and counts are working with your assembly. I wouldn't know what's going funky there, but that's where you should be looking.

ADD REPLY • link 8.7 years ago by Devon Ryan 105k

score 0 · Answer 1 · 2016-09-04

0

Entering edit mode

8.7 years ago

Biogeek ▴ 480

Any knowledge out there? :-) What could be causing the huge difference between genome gene model based counts and the de novo?

ADD COMMENT • link 8.7 years ago by Biogeek ▴ 480