Question

Non-symmetric shrinkage with apeglm?

2

Entering edit mode

4.4 years ago

CJP-OHRI ▴ 30

I have been analysing an RNA-seq data set with multiple conditions (injury and treatments), and have generated many DE gene sets using the DESeq2 lfcShrink() function with the apeglm method.

As the PI is preparing to publish the results he has asked that one of the volcano plots should be mirrored, so I recalculated the fold changes with the conditions reversed and replotted. I was surprised to find that the fold changes are not the same with the sign reversed, as I had expected they would be. I repeated in a vanilla DESeq2 session and confirmed that whilst the inverse comparison using the results() function results in a sign switch for the fold changes, using lfcShrink modifies the fold change estimates depending on which condition is used as the baseline.

Can anyone explain why this happens when using lfcShrink/apeglm? I had naively expected the changes to be mirrored as happens with the non-shrunken fold changes. (I'll also go back and re-read the apeglm paper to see if I can spot the reason in there).

Thanks,

Chris

RNA-Seq DESeq2 apeglm • 1.4k views

ADD COMMENT • link updated 13 months ago by Ram 45k • written 4.4 years ago by CJP-OHRI ▴ 30

0

Entering edit mode

I am also having the same issue , the example below uses the pasilla dataset used in the DESeq2 vignette to illustrate the problem.

require(DESeq2)
library("pasilla")

pasCts <- system.file("extdata","pasilla_gene_counts.tsv",package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
"pasilla_sample_annotation.csv",
package="pasilla", mustWork=TRUE)

cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))

coldata <- read.csv(pasAnno, row.names=1)

coldata <- coldata[,c("condition","type")]

coldata$condition <- factor(coldata$condition)

coldata$type <- factor(coldata$type)

rownames(coldata) <- sub("fb", "", rownames(coldata))


cts <- cts[, rownames(coldata)]


dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ condition)

featureData <- data.frame(gene=rownames(cts))
mcols(dds) <- DataFrame(mcols(dds), featureData)


dds$condition <- factor(dds$condition, 
                        levels = c("untreated","treated"))

dds$condition <- droplevels(dds$condition)

dds <- DESeq(dds)

res_shrink<-lfcShrink(dds,coef = "condition_treated_vs_untreated")

# repeat by change the base condition to treated
dds2<-dds
dds2$condition=relevel(dds2$condition,ref = "treated")
dds2<-DESeq(dds2)
res2_shrink<-lfcShrink(dds2,coef = "condition_untreated_vs_treated")

# compare the absolute log2FoldChange when changing the base contion
sum(abs(res2_shrink$log2FoldChange)!=abs(res_shrink$log2FoldChange),na.rm = T)
# 12359

#max difference in LogFC value
max(abs(res2_shrink$log2FoldChange)-abs(res_shrink$log2FoldChange),na.rm = T)
# 1.915671

SessionInfo below

R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 14.5

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pasilla_1.26.0              DEXSeq_1.44.0               RColorBrewer_1.1-3          AnnotationDbi_1.60.2       
 [5] BiocParallel_1.32.6         DESeq2_1.38.3               SummarizedExperiment_1.28.0 Biobase_2.58.0             
 [9] MatrixGenerics_1.10.0       matrixStats_0.63.0          GenomicRanges_1.50.2        GenomeInfoDb_1.34.9        
[13] IRanges_2.32.0              S4Vectors_0.36.2            BiocGenerics_0.44.0

ADD REPLY • link updated 13 months ago by Ram 45k • written 13 months ago by Graeme • 0

score 1 · Answer 1 · 2024-06-06

It is known and has been reported (many times) before that in apeglm the choice of reference level can have an influence on the shrinkage estimators. See for example: https://support.bioconductor.org/p/p134551/

Meaning, choose a reference level and stick with it.

I think the bottom line is (as usual) that more complicated methods tend to generally have advantages and often produce "better" results, but have the downside that the added complexity opens pitfalls that simpler methods (such as no shrinkage at all) do not -- at the price of inferior performance.