After Getting Normalization Factor Via Edger, What To Do For Normalization?
2
2
Entering edit mode
11.2 years ago
Ngsnewbie ▴ 380

Dear All,

This is a pretty simple question, but i am getting confused ..

I have raw count data. I am using edgR (TMM),

I got normalization factor by a function calcNormFactors in edgeR package. I got final normalized values using cpm function also.

Now when i divide (also tried with multiplication) raw count with corresponding normalization factors of library, the value obtained is not same as it was obtained directly by cpm function.

In case of DESeq, it is pretty simple that follows division of raw count by lib size only.

What is the next calculation of normalization after getting scaling factors (here value of calcNormFactors )?

head(tab)
ID    S1    S2
CA_gi|502076645|ref|XM_004485358.1|    4    2
CA_gi|502076654|ref|XM_004485361.1|    0    8
CA_gi|502076657|ref|XM_004485362.1|    65    62
CA_gi|502076684|ref|XM_004485369.1|    0    2
CA_gi|502076687|ref|XM_004485370.1|    26    55
CA_gi|502076690|ref|XM_004485371.1|    119    252
CA_gi|502076693|ref|XM_004485372.1|    68    70
CA_gi|502076703|ref|XM_004485375.1|    12    20
CA_gi|502076706|ref|XM_004485376.1|    0    2


 edger<-calcNormFactors(tab)
 edger
[1] 1.0536160 0.9491124


 head(cpm(tab))
ID    S1    S2
CA_gi|502076645|ref|XM_004485358.1|   3.90172   1.786435
CA_gi|502076654|ref|XM_004485361.1|   0.00000   7.145741
CA_gi|502076657|ref|XM_004485362.1|  63.40294  55.379492
CA_gi|502076684|ref|XM_004485369.1|   0.00000   1.786435
CA_gi|502076687|ref|XM_004485370.1|  25.36118  49.126969
CA_gi|502076690|ref|XM_004485371.1| 116.07616 225.090840

For example , for the first gene in sample S1

4 / 1.0536160 = 3.79644956 (Not equal to 3.90172), & 4 * 1.0536160 = 4.214464 (Again not equal to 3.90172)

normalization edger • 14k views
ADD COMMENT
9
Entering edit mode
11.2 years ago

The TMM counts are:

count / (library size * normalization factor)

Then multiply that by a million to get CPM.

Not

count / normalization factor

And DESeq doesn't just do a simple division by library size. It takes the median of the ratio of the count to the geometric mean of the expression values as the scaling factor for each library.

ADD COMMENT
0
Entering edit mode

Thanks Damian for the rectification :)

ADD REPLY
0
Entering edit mode

HI Damian kao I am trying TMM normalization with my miRNA-seq data. I am new to R programming, So can you tell me, 1. How should my input data looks? (I have raw counts). 2. Can I get the R code for TMM normalization. Thanks in advance.

ADD REPLY
3
Entering edit mode
11.2 years ago

The entire point behind TMM normalization is to not use solely summed count numbers (e.g., cpm aka counts per million). So, It's unclear why you'd find it surprising that multiplying or dividing the raw count be the library size normalization factor won't produce the counts per million. BTW, this will also be the case for DESeq, where the same computation also won't be equivalent to cpm. The next step is to estimateCommonDisp(edger) and so on. See the edgeR vignette.

ADD COMMENT

Login before adding your answer.

Traffic: 2165 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6