Question

edgeR TMM normalization failure

0

Entering edit mode

6.8 years ago

pavenhuizen ▴ 90

Dear all,

I'm running into what seems to be an issue in my differential gene expression (DGE) analysis. I have three samples, a control, mutant_1, and mutant_2, each with 3 biological replicates. I have quantified my transcripts using Salmon, imported and aggregated the data with tximport and performed the DGE analysis with edgeR by pairwisely comparing the control sample with the mutant samples. The result is that I get 1488 up-regulated and 217 down-regulated genes for mutant_1, and 1984 up-regulated and 1286 down-regulated genes for mutant_2.

The distribution of up- and down-regulated genes for mutant_1 is heavily skewed towards up-regulated genes and very different from mutant_2 and what was expected. Now I want to find out if the result I'm getting is representative of the biology, or if it is (in part) caused by the (failing) method. After reading related posts on DGE asymmetry, I've come to know that perhaps the TMM normalization has failed and one can examine the performence of said normalization using MD plots. The edgeR user guide says the following about MD plots:

Ideally, the bulk of genes should be centred at a log-fold change of zero. This indicates that any composition bias between libraries has been successfully removed.

This entry is accompanied by a single plot, without stating whether this is an example of "good" TMM normalization or a "bad" scenario.

I would like to know how I could properly test whether the TMM normalization succeeded and what I should in case if it turns out that the TMM normalization is not appropriate for my samples.

Thanks! Peter

RNA-Seq edgeR TMM DGE • 2.1k views

ADD COMMENT • link 6.8 years ago by pavenhuizen ▴ 90

0

Entering edit mode

I do not believe that there is any definitive measure that says whether normalisation has been successful or not. Such things are usually played out as you go through downstream analyses and then decide to go back a few steps and tweak some parameter until you are finally satisfied with your results.

The imbalance in DEGs in your case could be reflective of an outlier sample or outlier samples - if you generate a PCA bi-plot, this will quickly become evident if it is the case.

Generally speaking, you can assure good normalisation by eliminating variables (genes) that have low counts prior to performing the normalisation process.

ADD REPLY • link 6.8 years ago by Kevin Blighe 88k