Question

How does deseq2 encode more than 2 levels

0

Entering edit mode

8.4 years ago

-_- ★ 1.1k

When there are two levels per factor, it could be encoded as 0 and 1. What about 3 factors, then? Is it one-hot encoding or something like that when DESeq fit a generalized linear model over the factors? I don't find such information in the paper or user guide yet.

If you could even point me in the source code, that would even better. Thanks.

RNA-Seq DESeq2 differential expression • 2.0k views

ADD COMMENT • link updated 6.6 years ago by Biostar 20 • written 8.4 years ago by -_- ★ 1.1k

score 1 · Accepted Answer · 2017-06-14

DESeq2 uses model.matrix so you can just plug your design and colData into this base R function to see how it will be encoded.

Quoted from https://support.bioconductor.org/p/77620/#97059

> model.matrix(~participant+sampleType, coldata)
             (Intercept) participantX8326 participantX8329 sampleTypetumor
X8324_normal           1                0                0               0
X8324_tumour           1                0                0               1
X8326_normal           1                1                0               0
X8326_tumour           1                1                0               1
X8329_normal           1                0                1               0
X8329_tumour           1                0                1               1

So it's not really one-hot encoding, but something like it, where it uses [0, 0] to represent participant X8324.