Non-symmetric shrinkage with apeglm?
1
2
Entering edit mode
3.7 years ago
CJP-OHRI ▴ 30

I have been analysing an RNA-seq data set with multiple conditions (injury and treatments), and have generated many DE gene sets using the DESeq2 lfcShrink() function with the apeglm method.

As the PI is preparing to publish the results he has asked that one of the volcano plots should be mirrored, so I recalculated the fold changes with the conditions reversed and replotted. I was surprised to find that the fold changes are not the same with the sign reversed, as I had expected they would be. I repeated in a vanilla DESeq2 session and confirmed that whilst the inverse comparison using the results() function results in a sign switch for the fold changes, using lfcShrink modifies the fold change estimates depending on which condition is used as the baseline.

Can anyone explain why this happens when using lfcShrink/apeglm? I had naively expected the changes to be mirrored as happens with the non-shrunken fold changes. (I'll also go back and re-read the apeglm paper to see if I can spot the reason in there).

Thanks,

Chris

RNA-Seq DESeq2 apeglm • 1.1k views
ADD COMMENT
0
Entering edit mode

I am also having the same issue , the example below uses the pasilla dataset used in the DESeq2 vignette to illustrate the problem.

require(DESeq2)
library("pasilla")

pasCts <- system.file("extdata","pasilla_gene_counts.tsv",package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
"pasilla_sample_annotation.csv",
package="pasilla", mustWork=TRUE)

cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))

coldata <- read.csv(pasAnno, row.names=1)

coldata <- coldata[,c("condition","type")]

coldata$condition <- factor(coldata$condition)

coldata$type <- factor(coldata$type)

rownames(coldata) <- sub("fb", "", rownames(coldata))


cts <- cts[, rownames(coldata)]


dds <- DESeqDataSetFromMatrix(countData = cts,
                              colData = coldata,
                              design = ~ condition)

featureData <- data.frame(gene=rownames(cts))
mcols(dds) <- DataFrame(mcols(dds), featureData)


dds$condition <- factor(dds$condition, 
                        levels = c("untreated","treated"))

dds$condition <- droplevels(dds$condition)

dds <- DESeq(dds)

res_shrink<-lfcShrink(dds,coef = "condition_treated_vs_untreated")

# repeat by change the base condition to treated
dds2<-dds
dds2$condition=relevel(dds2$condition,ref = "treated")
dds2<-DESeq(dds2)
res2_shrink<-lfcShrink(dds2,coef = "condition_untreated_vs_treated")

# compare the absolute log2FoldChange when changing the base contion
sum(abs(res2_shrink$log2FoldChange)!=abs(res_shrink$log2FoldChange),na.rm = T)
# 12359

#max difference in LogFC value
max(abs(res2_shrink$log2FoldChange)-abs(res_shrink$log2FoldChange),na.rm = T)
# 1.915671

SessionInfo below

R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 14.5

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pasilla_1.26.0              DEXSeq_1.44.0               RColorBrewer_1.1-3          AnnotationDbi_1.60.2       
 [5] BiocParallel_1.32.6         DESeq2_1.38.3               SummarizedExperiment_1.28.0 Biobase_2.58.0             
 [9] MatrixGenerics_1.10.0       matrixStats_0.63.0          GenomicRanges_1.50.2        GenomeInfoDb_1.34.9        
[13] IRanges_2.32.0              S4Vectors_0.36.2            BiocGenerics_0.44.0
ADD REPLY
1
Entering edit mode
5 months ago
ATpoint 85k

It is known and has been reported (many times) before that in apeglm the choice of reference level can have an influence on the shrinkage estimators. See for example: https://support.bioconductor.org/p/p134551/

Meaning, choose a reference level and stick with it.

I think the bottom line is (as usual) that more complicated methods tend to generally have advantages and often produce "better" results, but have the downside that the added complexity opens pitfalls that simpler methods (such as no shrinkage at all) do not -- at the price of inferior performance.

ADD COMMENT

Login before adding your answer.

Traffic: 1800 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6