Entering edit mode
2.7 years ago
lluc.cabus
▴
20
Hi everyone,
I'm doing a differential gene expression between two different groups and I have found that the volcano plot looks very weird. I think that this comes from the FDR transformation of the p-values. I have seen there are 21292 unique p-values but there are only 306 unique FDR values. Can you see what I did wrong to have these results?
Thank you very much!
The code to do the differential gene expression and the figure is as follows:
dgList <- DGEList(counts=FL_count_matrix[,1:6], genes=rownames(FL_count_matrix), group = as.factor(FL_metadata_EDTA$Spin_type))
dgList <- calcNormFactors(dgList, method="TMM")
dgList <- estimateCommonDisp(dgList, verbose = T)
dgList <- estimateGLMTrendedDisp(dgList, verbose = T)
dgList <- estimateGLMTrendedDisp(dgList, verbose = T)
dgList <- estimateGLMTagwiseDisp(dgList)
fit <- glmQLFit(dgList)
qlf <- glmQLFTest(fit, coef=2)
FDR= p.adjust(qlf$table$PValue, method="fdr")
qlf$table$FDR= FDR
qlf$table$minus_log10FDR= -log10(FDR)
p <- ggplot(data=qlf$table, aes(x=logFC, y=minus_log10FDR)) + geom_point()+
geom_hline(yintercept=1, linetype="dashed", color = "red")+ geom_vline(xintercept=c(-1,1), linetype="dashed", color = "red")
p
You simply do not seem to have any DEGs, and the FDR correction pinned all the pvalues to the same (or similar) FDR which is not unusual. Use QC strategies such as PCA or MDS (see edgeR vignette) to diagnose if confounding factors are responsible for this. Also check MA-plots. You have large fold changes in your data, so explore why these are not significant. Maybe batch effects? How many samples is it per group? 3vs3?
Okay, I also have another comparison where I have multiple differentially expressed genes but I also have something similar, this is also normal? I add the volcano plot below. I have used exactly the same code as above, but with different samples.
Sorry, I didn't see the update on the comment. I used 3vs3 here. Batch effect is not an option, since they were processed and sequenced together. In the PCA I see that they form clusters, so it is normal that they have high logFC. However, I don't understand why there are not genes with low FDR values. Could it be because the size of the data is very small and the number of genes is very high? I didn't filter any of the genes, so there are 60308 genes