Explanation of DESeq factor leveling
0
1
Entering edit mode
23 days ago
bthom ▴ 10

Can someone help me to understand why the following code returns different results for df (66 DEGs) and df_1 (113 DEGs). As far as I understand, the two code chunks should be interchangeable (aside from a difference in the signs of the log2FC).

dds <- DESeqDataSetFromMatrix(countData = cts,
                          colData = sample_matrix,
                          design = ~MFG)
dds$MFG <- relevel(dds$MFG , ref = "Good_Unknown")
dds <- DESeq(dds)
res <- results(dds, contrast = c("MFG", "Good_Unknown", "Fail"))
resLFC<- lfcShrink(dds, coef = "MFG_Fail_vs_Good_Unknown", type="apeglm")
df <- as.data.frame(resLFC)
df |> filter(log2FoldChange > 0.58 | log2FoldChange < -0.58) |> filter(padj < 0.1)

and

dds_1 <- DESeqDataSetFromMatrix(countData = cts,
                          colData = sample_matrix,
                          design = ~MFG)
dds_1$MFG <- factor(dds_1$MFG, levels = c("Fail","Good_Unknown"))
dds_1 <- DESeq(dds_1)
res_1 <- results(dds_1)
resultsNames(dds_1)
resLFC_1 <- lfcShrink(dds_1, coef="MFG_Good_Unknown_vs_Fail", type="apeglm")
df_1 <- as.data.frame(resLFC_1)
df_1 |> filter(log2FoldChange > 0.58 | log2FoldChange < -0.58) |> filter(padj < 0.1)
DESeq • 518 views
ADD COMMENT
1
Entering edit mode

It has been reported before that the choice of reference level can lead to some differences in how apeglm estimates the shrunken fold changes. And since you filter on these that ight be the cause. Can you check whether padj are the same between both and fold changes are a bit different? The solution anyway is to decide for one reference level and stick with it. Unfortunately, these sorts of statistical procedures have the downside that they might show some unexpected behaviour in some corner cases, so decide for one and go on.

ADD REPLY
0
Entering edit mode

Hm, thanks for the insight. It seems like both the fold changes and the padj are slightly different. Can you explain how the choice of the reference levels differs between my two code chunks and how this can alter apeglm's behavior? Shouldn't they be the same?

ADD REPLY
0
Entering edit mode

I cannot explain why it alters the behaviour, that is beyond my understanding of the apeglm internals, but guess who asked this exact same question and got an answer here :-D https://support.bioconductor.org/p/p134551/

ADD REPLY
0
Entering edit mode

Thanks for sharing this! Would it be bad practice to reply to Mike's comment on bioconductor? I see that he had a possible explanation for your case (>6 levels), but here I only have 2 levels.

ADD REPLY
0
Entering edit mode

I assume the difference in whether factor was used isn't doing anything because the elements of MFG are strings, and not numbers?

ADD REPLY
0
Entering edit mode

I am not sure that I follow, factors in R are stored as integers with labels assigned to them. Furthermore, the code example in the DESeq vignette uses strings: "condition","treated","untreated"

ADD REPLY

Login before adding your answer.

Traffic: 1545 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6