Question

DESEQ2 size factors change when number of samples change in the design

1

Entering edit mode

6.2 years ago

ZheFrench ▴ 590

For example, when you analyse Untreated condition vs Day1Treatment condition with only these samples in your design you will get some size factors.

If you have also set up a Day6Treatment condition in your design (but you don't use it yet), and want to to do the same comparison Untreated vs Day1Treatment, the size factors will change. (taking into account Day6)

I thought size factors was only relative to library size of its sample , so why do they change dependent on the design file even if you don't use a set of samples.

I was doing analyse Day1Treatment vs Untreated & Day6Treatment vs Untreated with two separate design files. But now I am wondering if it's better to have one design with all the samples, and do the two comparison to get same sizefactors because at the end of the day you finish with different differential genes detected.

DESEQ2 • 2.3k views

ADD COMMENT • link updated 6.0 years ago by erwan.scaon ▴ 950 • written 6.2 years ago by ZheFrench ▴ 590

0

Entering edit mode

6.0 years ago

erwan.scaon ▴ 950

So if you have 1 large experiment its better to put everything together and then perform comparisons

It is still true if said large experiment was done on multiple Illumina runs (compute size factor for all samples on all runs) ? Or shall we compute size factor for all samples per Illumina run ?

ADD COMMENT • link 6.0 years ago by erwan.scaon ▴ 950

0

Entering edit mode

You should put everything together and also ideally add batch effect to your formula design.

ADD REPLY • link 6.0 years ago by grant.hovhannisyan ★ 2.6k

score 5 · Accepted Answer · 2018-09-21

5

Entering edit mode

6.2 years ago

grant.hovhannisyan ★ 2.6k

SizeFactors are calculated based on all the samples in your dds object. So if you have 1 large experiment its better to put everything together and then perform comparisons for example using contrasts, rather than making new dds objects with subsets of your original dataset.

ADD COMMENT • link 6.2 years ago by grant.hovhannisyan ★ 2.6k

0

Entering edit mode

Agreed. The idea is that you estimate a size factor for each column that best scales the datasets based on a large set of genes that do not change upon conditions. Given that you do not have samples with extreme global changes, it is probably the best to have as many samples in the matrix as possible. This probably produces more robust size factors than with only two or three samples.

ADD REPLY • link 6.2 years ago by ATpoint 85k

0

Entering edit mode

Ok I got it , but do you know how is it computed ?

ADD REPLY • link 6.2 years ago by ZheFrench ▴ 590

1

Entering edit mode

It is described in the original DESeq paper https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-10-r106, and I think the same method is used in DESeq2.

ADD REPLY • link 6.2 years ago by grant.hovhannisyan ★ 2.6k

1

Entering edit mode

Yes, it is actually pretty simple but powerful from the concept. Check out StatQuest for a nice explanation.

ADD REPLY • link 6.0 years ago by ATpoint 85k