Question

Explanation of DESeq factor leveling

1

Entering edit mode

11 weeks ago

bthom ▴ 10

Can someone help me to understand why the following code returns different results for df (66 DEGs) and df_1 (113 DEGs). As far as I understand, the two code chunks should be interchangeable (aside from a difference in the signs of the log2FC).

dds <- DESeqDataSetFromMatrix(countData = cts,
                          colData = sample_matrix,
                          design = ~MFG)
dds$MFG <- relevel(dds$MFG , ref = "Good_Unknown")
dds <- DESeq(dds)
res <- results(dds, contrast = c("MFG", "Good_Unknown", "Fail"))
resLFC<- lfcShrink(dds, coef = "MFG_Fail_vs_Good_Unknown", type="apeglm")
df <- as.data.frame(resLFC)
df |> filter(log2FoldChange > 0.58 | log2FoldChange < -0.58) |> filter(padj < 0.1)

and

dds_1 <- DESeqDataSetFromMatrix(countData = cts,
                          colData = sample_matrix,
                          design = ~MFG)
dds_1$MFG <- factor(dds_1$MFG, levels = c("Fail","Good_Unknown"))
dds_1 <- DESeq(dds_1)
res_1 <- results(dds_1)
resultsNames(dds_1)
resLFC_1 <- lfcShrink(dds_1, coef="MFG_Good_Unknown_vs_Fail", type="apeglm")
df_1 <- as.data.frame(resLFC_1)
df_1 |> filter(log2FoldChange > 0.58 | log2FoldChange < -0.58) |> filter(padj < 0.1)

DESeq • 634 views

ADD COMMENT • link 10 weeks ago by bthom ▴ 10

1

Entering edit mode

It has been reported before that the choice of reference level can lead to some differences in how apeglm estimates the shrunken fold changes. And since you filter on these that ight be the cause. Can you check whether padj are the same between both and fold changes are a bit different? The solution anyway is to decide for one reference level and stick with it. Unfortunately, these sorts of statistical procedures have the downside that they might show some unexpected behaviour in some corner cases, so decide for one and go on.

ADD REPLY • link 11 weeks ago by ATpoint 86k

0

Entering edit mode

Hm, thanks for the insight. It seems like both the fold changes and the padj are slightly different. Can you explain how the choice of the reference levels differs between my two code chunks and how this can alter apeglm's behavior? Shouldn't they be the same?

ADD REPLY • link 11 weeks ago by bthom ▴ 10

0

Entering edit mode

I cannot explain why it alters the behaviour, that is beyond my understanding of the apeglm internals, but guess who asked this exact same question and got an answer here :-D https://support.bioconductor.org/p/p134551/

ADD REPLY • link 11 weeks ago by ATpoint 86k

0

Entering edit mode

Thanks for sharing this! Would it be bad practice to reply to Mike's comment on bioconductor? I see that he had a possible explanation for your case (>6 levels), but here I only have 2 levels.

ADD REPLY • link 10 weeks ago by bthom ▴ 10

0

Entering edit mode

I assume the difference in whether factor was used isn't doing anything because the elements of MFG are strings, and not numbers?

ADD REPLY • link 11 weeks ago by swbarnes2 14k

0

Entering edit mode

I am not sure that I follow, factors in R are stored as integers with labels assigned to them. Furthermore, the code example in the DESeq vignette uses strings: "condition","treated","untreated"

ADD REPLY • link 11 weeks ago by bthom ▴ 10