Dear all,
I am somewhat lost for ideas on where to go with analysing my data. I'm working on a polyploid plant and recently conducted an RNA-seq experiment , to which I would say was conducted rather rigorously. I have assembled the transcriptome and I have conducted RSEM and fitted a GLM model in EdgeR as the experiment was multi factorial ( time points and treatments).
My problem is my BCV is rather high.Without edgeR filtering (cpm) in X samples, it sits about 0.9. However if I apply a pre filter of say at least 2cpm in 9 samples out of 18. It reduces to 0.47. The actual differentially expressed genes make biological sense but my main concern is the rather high BCV. I cannot seem to pinpoint the reason behind it. About 1/5th of the total genes pass my filter and as such I get about 3k diff expressed genes in conditions. The MDSplot has next to non- existent grouping which is annoying. I'm therefore debating if diff expression analysis is correct for this dataset as it seems to be ridiculed with variation.
Any ideas on how I could possibly perform an alternative analysis based on a subset of genes of interest without having the background noise of the entire dataset?
Thanks.