I have an RNASeq experiment, and I am using DESeq2. After I get the results, I plot the MA plot. This is the output of plotMA:
And this is my attempt at the MA plot:
res$significant = (res$padj < .05)
res$significant = as.factor(res$significant)
res$significant[is.na(res$significant)] = F
ggplot(as.data.table(res), aes(x=log2(baseMean), y=log2FoldChange, color=significant)) +
geom_point() +
geom_hline(color = "blue3", yintercept = 0) +
stat_smooth(se = FALSE, method = "loess", color = "red3") +
scale_color_manual(values=c("Black","Red"))
- There is a slight bias at the end, so genes with a high A, tend to have a high M, and we are detecting more up-regulation than down. Is this a problem? What might be causing this, and more importantly, is there something we can do to fix it?
- Even if the slight effect is too little to be a problem, what causes problems like this? Imbalanced sampling depth at the two conditions? Why doesn't normalization (sample size factors) fix this?
Also, is there a reason why DESeq2::plotMA doesn't plot the best fit line?
Disclaimer: Cross posted to Biconductor questions.
The median ratio or TMM assumption for normalization is not that the majority of genes have LFC=0, but that the median or trimmed mean of ratios captures the technical shifts in counts. So it's more about the reliability of the center of the distribution of LFCs to capture technical shifts rather than requiring so many genes strictly to belong to the null. If the entire distribution is shifted, e.g. global increase in expression, then obviously computational normalization cannot be relied upon. But you could have, e.g. 40% upregulated and 20% downregulated and still have the median or trimmed mean roughly capturing the center of the distribution (the technical shift in counts due to sequencing depth).
Thanks for this precision ! (I need to read your DESeq2 paper again)