Question

DESeq2 normalization - Found Difference in values when calculated manually

0

Entering edit mode

3.2 years ago

Sandhiya ▴ 20

Hi All,

I learnt about how DESeq2 normalises the raw read counts in statquest. It was very helpful and I tried to repeat the steps for my sample data.

But I am getting the output with minor difference - like for each sample outpu I get some difference .

Example:

Normalised Output from DESeq2

The above is the DESeq2 normalised count which I retrieved using count() in DESeq2 library.

When I manually calculate the normalization I get the following output :

Manually calculated normalisation values

So there is difference . I'm confused why i get this different values ?

Please help me to sort this out..

Many thanks in advance..

DESeq2 normalization difference • 3.4k views

ADD COMMENT • link 3.2 years ago by Sandhiya ▴ 20

0

Entering edit mode

What is the code you're using to maually calculate the values?

ADD REPLY • link 3.2 years ago by rpolicastro 13k

0

Entering edit mode

For manual calculation I have not used code. I tried to do it following the steps:

Please check the steps:

step 1:take log of all values - ie.,read count Step 2: Average each row Step 3: filter zero and infinite values - not applicable for our data Step : 4 Subtract the averae log value from log(counts) Step5 : calculate median for each sample Step 6: convert medians to normal numbers to get the final scoring factors for each sample Step 7 : Divide original read counts by scaling factors:

ADD REPLY • link 3.2 years ago by Sandhiya ▴ 20

0

Entering edit mode

This is the exact code you can use to double-check: https://github.com/mikelove/DESeq2/blob/master/R/core.R#L534-L577

ADD REPLY • link 3.2 years ago by ATpoint 85k

0

Entering edit mode

Thank you for the code . I have exactly followed the same steps. But I get a different output when I calculate manually.

ADD REPLY • link 3.2 years ago by Sandhiya ▴ 20

0

Entering edit mode

Do you take the natural log rather than log10 or log2?

ADD REPLY • link 3.2 years ago by ATpoint 85k

0

Entering edit mode

I used log10 and checked output - it was different; log2 result is also different for deseq output..

ADD REPLY • link 3.2 years ago by Sandhiya ▴ 20

score 3 · Answer 1 · 2021-09-09

3

Entering edit mode

3.2 years ago

ATpoint 85k

I cannot see your code or your notes that you write down but here is how it goes in R using only base functions, following what the StatQuest video does with some dummy data. Results are identical to DESeq2, because StatQuest is awesome :)

data <- data.frame(Sample1=c(1000, 1098, 2234), Sample2=c(987, 778, 888), Sample3=c(999,888,0),
                   row.names = c("gene1", "gene2", "gene3"))

#/ Step1: take the log of all values (natural log!)
step1 <- log(data)

#/ Step2: take the mean:
step2 <- apply(step1, 1, mean)

#/ Step3: remove infinities:
step3 <- step2[is.finite(step2)]

#/ Step4: subtract the average log value from the log-transformed raw counts:
step4 <- step1[names(step3),] - step2[names(step3)]

#/ Step5: calculate the median for each sample (so column-wise)
step5 <- apply(step4, 2, median)

#/ Step6: put back to normal scale:
step6 <- exp(step5)

data.frame(manually=step6,
           DESeq2=as.numeric(DESeq2::estimateSizeFactorsForMatrix(data)))

   manually    DESeq2
1 1.0998199 1.0998199
2 0.9197479 0.9197479
3 0.9885750 0.9885750

Edit: Here is a spreadsheet with an Excel implementation: https://uni-muenster.sciebo.de/s/xQHwwyVrG9NFqst

ADD COMMENT • link 3.2 years ago by ATpoint 85k

1

Entering edit mode

BAM ! #statquest

ADD REPLY • link 3.2 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Thank you. I much appreciate your solution and Statquest - of course, I have learnt many concepts from this channel.

But for the example which you have discussed here, I'm not getting the values that you have mentioned.

I request you to correct me with my steps:

enter image description here

ADD REPLY • link 3.2 years ago by Sandhiya ▴ 20

0

Entering edit mode

I can confirm ATpoint 's calculations on my machine. Not sure what is going on but i think it might be during your scaling step

ADD REPLY • link 3.2 years ago by LauferVA 4.5k

0

Entering edit mode

I'm still figuring out "why there is difference"

ADD REPLY • link 3.2 years ago by Sandhiya ▴ 20

1

Entering edit mode

when i am having trouble replicating results, I normally start by eliminating any and all possible differences of which I am aware.

This is a personal thing, but I would start in R, since that's how DESeq2 is implemented...

hope it helps!

ADD REPLY • link 3.2 years ago by LauferVA 4.5k

0

Entering edit mode

Thank you! You mean to say - just focus on the normalized count values obtained from deseq2 output. Right?

ADD REPLY • link 3.2 years ago by Sandhiya ▴ 20

0

Entering edit mode

You can implement it in any language you want, even Excel...the steps are not difficult but I wonder why you do not get the results you want. Be sure that the order of operations is correct, e.g. in step4 it is logged counts minus the means of step1 and not vice versa.

I added a link to a spreadsheet with an Excel version to my answer.

ADD REPLY • link 3.2 years ago by ATpoint 85k

0

Entering edit mode

Thank you ATpoint . I checked your excel - you have implemented all steps but in your excel Step 7 : Divide original read counts by scaling factorsis missing. This step 7 is the final one which gives the normalized counts - as per the Statquest video . You also check the step 7 - final result is not the same when you compare with the one from deseq2 normalized count.

ADD REPLY • link 3.2 years ago by Sandhiya ▴ 20