Hi, we did an RNA seq experiment recently. It was 3 replicates for 2 conditions. One replicate turned out to be contaminated so I have eliminated it. I want to do differential expression analysis.
I have a script for EdgeR and it works fine, however I am getting way too many up-regulated genes. When I do a summary EdgeR tells me there are 83 up-regulated genes, 13 down-regulated. This does not really make sense and there is nothing about either of these two conditions that should cause a large activation / deactivation of transcription. I'm wondering what I can do to correct this. How do I go about diagnosing if there's anything wrong with my replicates and what can be done to correct for it.
I think the dispersion was a little high Disp = 0.10443 , BCV = 0.3232. Ive tried fiddling with the counts per million cutoff and it helps a bit but I'm wondering if I can do anything else
targets <- readTargets()
x <- read.delim("EdgeR.counts.m.txt", row.names=1, stringsAsFactors=FALSE)
y <- DGEList(counts=x[,1:5], group=targets$Treatment)
colnames(y) <- targets$Label
keep <- rowSums(cpm(y)>5) >= 3
y <- y[keep,]
dim(y)
y$samples$lib.size <- colSums(y$counts)
y <- calcNormFactors(y)
y$samples
plotMDS(y)
y <- estimateCommonDisp(y, verbose=TRUE)
y <- estimateTagwiseDisp(y)
plotBCV(y)
wm <- exactTest(y, pair=c("wt","mt"))
summary(de <- decideTestsDGE(wm))
top <- topTags(wm)
top
Yes I do think 83 >> 13. You are disagreeing with this?. Even if it was 'random noise' you would expect it to be equal, if there was nothing else going on.
Oh I misunderstood. You're concerned with the imbalance. I don't think it's a problem, but I also don't have a good justification for you. Possibly one pathway is activated and another deactivated, and one happens to carry 83 genes, the other 13. You're sensitive to our definitions and discoveries of gene IDs.