Conflicting results with DESeq2 between versions
1
2
Entering edit mode
7.5 years ago
BioBing ▴ 150

Hi all,

I have been using the Kallisto -> Sleuth pipeline on my RNAseq data consisting of a control and two treatments (in triplicates). From that I got 8 DETs (q-value < 0.05) for treatment 1 and 12 DETs for treatment 2. The highest beta ("equivalent" to log fold change) were around 6 and the lowest -4.

In order to validate my results from Sleuth, I wanted to use DESeq2 to see if I got similar results (+ make "prettier"/customizable figures). Tximport were used to load the count data from Kallisto into the DESeq2 pipeline. The first runs with DESeq2 were pretty consistent with the Sleuth results (considering Sleuth is more conservative than DESeq2) with 19 DETs for treatment1 and 34 for treatment 2. The highest/lowest log fold change was around 6.6 and -3.

BUT then... I updated by version of R and DESeq2 and suddenly I got completely different results. The exact same R script (with updated packages) have then 100 DETs for treatment1 and 143 for treatment2 with highest/lowest log fold changes around 21 and -25.5.

This is really weird - have others of you had similar experiences? What causes this? and how do I "fix" it (especially the log fold changes seems way off in the second analysis)

The code:

#Libraries
library(tximport)
library(rhdf5)
library(DESeq2)

#base directory:
dir <- "~/Documents/RNASeq/kallisto/data/"

#Metadata
samples <- read.table(file.path(dir, "metadata.txt"), header=TRUE, stringsAsFactors = FALSE)

#Kallisto-data
files <- file.path(dir,"results", samples$path, "abundance.tsv")
names(files)<- paste0(samples$Sample)
txi <- tximport(files, type = "kallisto", txOut = TRUE)

#DESeq2
sampleTable <- data.frame(condition = factor(rep(c("Treatment1", "Treatment2", "Control"), each = 3)))
rownames(sampleTable) <- colnames(txi$counts)
dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition)

#Pre-filtering
dds <- dds[rowSums(counts(dds))>1, ]

#Control = first level
dds$condition <- relevel(dds$condition, ref="Control")

#Run DESeq2
 dds <-DESeq(dds)

#Call and inspect the results table
res <- results(dds)

# results for each treatment vs. control
res_treat1 <- results(dds, contrast=c("condition", "Treatment1", "Control"))
res_treat2 <- results(dds, contrast=c("condition", "Treatment2", "Control"))

# Call information on the result object on which variables and tests were used:
mcols(res_treat1, use.names=TRUE)
mcols(res_treat2, use.names=TRUE)

# Call number of significant results with BH-adjusted p-value < 0.05
sum(res_treat1$padj < 0.05, na.rm=TRUE)
sum(res_treat2$padj < 0.05, na.rm=TRUE)

#Subset the results and sort them by the log2 fold change down-regulation:
 resSig_treat1<- res[which(res_treat1$padj < 0.05), ]
 head(resSig_treat1[ order(resSig_treat1$log2FoldChange ), ])

 resSig_treat2<- res[which(res_treat2$padj < 0.05), ]
 head(resSig_treat2[ order(resSig_treat2$log2FoldChange ), ])

 #And sort by up-regulations:
 tail(resSig_treat1[order(resSig_treat1$log2FoldChange ), ] )
 tail(resSig_treat2[order(resSig_treat2$log2FoldChange ), ] )

Session-info (for the updated run):

R version 3.4.0 (2017-04-21)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] da_DK.UTF-8/da_DK.UTF-8/da_DK.UTF-8/C/da_DK.UTF-8/da_DK.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.17.2              SummarizedExperiment_1.7.4 DelayedArray_0.3.9         matrixStats_0.52.2        
 [5] Biobase_2.37.2             GenomicRanges_1.29.4       GenomeInfoDb_1.13.4        IRanges_2.11.3            
 [9] S4Vectors_0.15.3           BiocGenerics_0.23.0        rhdf5_2.21.1               tximport_1.5.0            

loaded via a namespace (and not attached):
 [1] genefilter_1.59.0       locfit_1.5-9.1          splines_3.4.0           lattice_0.20-35        
 [5] colorspace_1.3-2        htmltools_0.3.6         base64enc_0.1-3         survival_2.41-3        
 [9] XML_3.98-1.7            rlang_0.1.1             DBI_0.6-1               foreign_0.8-68         
[13] BiocParallel_1.11.2     RColorBrewer_1.1-2      GenomeInfoDbData_0.99.1 plyr_1.8.4             
[17] stringr_1.2.0           zlibbioc_1.23.0         munsell_0.4.3           gtable_0.2.0           
[21] htmlwidgets_0.8         memoise_1.1.0           latticeExtra_0.6-28     knitr_1.16             
[25] geneplotter_1.55.0      AnnotationDbi_1.39.1    htmlTable_1.9           Rcpp_0.12.11           
[29] acepack_1.4.1           xtable_1.8-2            scales_0.4.1            backports_1.1.0        
[33] checkmate_1.8.2         Hmisc_4.0-3             annotate_1.55.0         XVector_0.17.0         
[37] gridExtra_2.2.1         ggplot2_2.2.1           digest_0.6.12           stringi_1.1.5          
[41] grid_3.4.0              tools_3.4.0             bitops_1.0-6            magrittr_1.5           
[45] RSQLite_1.1-2           lazyeval_0.2.0          RCurl_1.95-4.8          tibble_1.3.3           
[49] Formula_1.2-1           cluster_2.0.6           Matrix_1.2-10           data.table_1.10.4      
[53] rpart_4.1-11            nnet_7.3-12             compiler_3.4.0
RNA-Seq R DESeq2 rna-seq • 3.6k views
ADD COMMENT
1
Entering edit mode

Can you just clarify that you're trying to do differential transcript expression using DESeq2? - The negative binomial distribution that's used for counts has not been tested on transcript level counts as far as I know. I don't think that you should expect results to be similar. You could import your transcript counts, voom transform them and try this approach in Limma, which might be a more robust approach.

ADD REPLY
0
Entering edit mode

Yes, it is a differential transcript expression - and I am aware of the differences. I did this approach mostly to test out tximport.

I found out the lfcShrink() function was the difference from earlier version.

ADD REPLY
1
Entering edit mode

Hello BioBing!

It appears that your post has been cross-posted to another site: https://support.bioconductor.org/p/96750/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Hi WouterDeCoster,

Yes, sorry. I was not sure where to post it - so to be sure I posted it both places (I found a post after this one where someone had written that is was better to post questions about DESeq it on Bioconductor).

Thanks :-)

ADD REPLY
0
Entering edit mode

It definitely makes more sense to post it at Bioconductor Support, but it's not so nice to have both communities "work" on your question. So that's why it's best to avoid cross posting.

Make sure to add the answer in this thread as soon as you get one at Bioconductor.

ADD REPLY
0
Entering edit mode

The answer is already provided, the OP should add it here.

ADD REPLY
2
Entering edit mode
7.5 years ago
IP ▴ 770

Do you know what was the previous version of the DEseq2 installed in your system? In the changelog of DEseq2 they say that in version 1.16.0 the default setting of the beta prior will be "False".

Furthermore , they write the following: "The recommended pipeline will be to use lfcShrink() for producing shrunken LFC.". I am not sure if this would produce the changes you see in your dataset, but I think that it could be an explanation.

I will try to use DEseq version 1.16.0 and then use the previous one (version 1.15.40) to see if this explains the differences you see.

ADD COMMENT
0
Entering edit mode

Thank you! I was not aware of the lfcShrink() function - that changed everything!

ADD REPLY

Login before adding your answer.

Traffic: 2943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6