TMM normalization factors in RNA-seq analysis
1
2
Entering edit mode
7.2 years ago
sarahmanderni ▴ 120

Hi,

To my understanding, the main aim in TMM normalization is to account for library size variation between samples of interest. I have a simulated RNA-seq data with equal library sizes for all samples. I ran TMM normalization and I expected to find all normalization factors (from calcNormFactors() function) equal to one. However, the factors vary from 0.4 to 2.4 (with median of 1 of course) and this is not what I expect. Have I misunderstood something here? Another question is can I use TMM normalization for non-binomial values? for instance over TPM values?

Thanks in advance!

RNA-Seq TMM normalization • 28k views
ADD COMMENT
2
Entering edit mode

Exactly how did you simulate the data. TMM is a robust measure, so if you produced very different distributions of reads then that'd be the cause.

ADD REPLY
0
Entering edit mode

To my knowledge TMM is supposed to correct mostly for composition bias (as well as library size). If you generated samples with different compositions then it's correct that the normalisation factors would vary.

ADD REPLY
0
Entering edit mode

I have nt produced the data myself; but yes the distribution of the reads vary significantly. Can you elaborate a little more what do you mean by robust measure and in what way the distribution affects?

ADD REPLY
6
Entering edit mode
7.2 years ago
h.mon 35k

The main aim in TMM normalization is to account for library size variation between samples of interest, accounting for the fact that some extremely differentially expressed genes would impact negatively the normalization procedure - or as Devon Ryan said, it is a robust normalization. How does it achieve its robustness? From the paper:

A trimmed mean is the average after removing the upper and lower x% of the data.

So an assumption of TMM is the majority of the genes are not differentially expressed. And as Devon pointed, different distributions of gene expression will result in different TMM normalizations.

ADD COMMENT
0
Entering edit mode

Makes sense. Will check the paper again, thanks.

ADD REPLY
0
Entering edit mode

Do you have experience of applying it over TPM values?

ADD REPLY
1
Entering edit mode

I have none, but it seems you can do it (yes, you can).

ADD REPLY
0
Entering edit mode

I am also confused about normalization and statistics behind DE programs, I am using edgeR to analize two condittions.

Example for a gene ( raw-counts) four replicates by condition control (C) tratmeat (T) of a gene:

gene= FBgn0034710

Controles = 820 1618 1728 1007

Tratamientos= 7195 1252 1312 1291

Result of edgeR

logFC =1.10
logCPM = 6.5 LR = 9.77 PValue = 0.0017
FDR= 0.02

Why FBgn0034710 gene is statistically significant if one replicate (7195) has a lot of raw count in comparation with the others. I know that library size could be a factor but this is similar in the other replicates

ADD REPLY

Login before adding your answer.

Traffic: 1607 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6