Due to Covid19 situation I am working part time at the lab and at home. At the lab I am working under an HP workstation linux environment and at home on MSI with windows 10 and Rstudio.
As input to the SCTransform function I use the same RDS object, same parameters, same R version, same Seurat version, but I get slightly different output results which lead to different UMAPs.
I have set the same seed.use
SCTransform's parameter in both environment.
From the lab :
library(Seurat)
SCRNARR.list <- readRDS(file = "/mnt/raid1/Data/SCRNARR/SCRNARR_BeforeSCT.rds")
object.size(SCRNARR.list)
944243832 bytes
for (sample in 1:length(SCRNARR.list)) {
SCRNARR.list[[sample]] <- SCTransform(SCRNARR.list[[sample]], seed.use=1447854, variable.features.n = 2000, vars.to.regress = "percent.mt", verbose = FALSE)
}
There were 50 or more warnings (use warnings() to see the first 50)
object.size(SCRNARR.list)
2131213616 bytes
From home :
library(Seurat)
SCRNARR.list <- readRDS(file = "D://KI/VM/Data/SCRNARR/SCRNARR_BeforeSCT.rds")
object.size(SCRNARR.list)
944243832 bytes
for (sample in 1:length(SCRNARR.list)) {
SCRNARR.list[[sample]] <- SCTransform(SCRNARR.list[[sample]], seed.use=1447854, variable.features.n = 2000, vars.to.regress = "percent.mt", verbose = FALSE)
}
There were 50 or more warnings (use warnings() to see the first 50)
object.size(SCRNARR.list)
2131212976 bytes
I went back up to the first diverging point between the two methods and SCTransform function is the first function outputing different object sizes leading to different UMAP conformation.
This issue does not come neither from the randomness of UMAP creation as at home I always get the same UMAP and in the mean time at the office I get the same UMAPs, but every time UMAPs from the office and UMAPs from home are different.
Could it be related to the computer itself, like how both computers are handling floating point ?
Thanks !
Specifications :
From the lab :
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=sv_SE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=sv_SE.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=sv_SE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=sv_SE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Seurat_3.2.0
loaded via a namespace (and not attached):
[1] httr_1.4.2 tidyr_1.1.1 jsonlite_1.7.0
[4] viridisLite_0.3.0 splines_4.0.2 leiden_0.3.3
[7] shiny_1.5.0 ggrepel_0.8.2 globals_0.12.5
[10] pillar_1.4.6 lattice_0.20-41 glue_1.4.1
[13] reticulate_1.16 digest_0.6.25 polyclip_1.10-0
[16] RColorBrewer_1.1-2 promises_1.1.1 colorspace_1.4-1
[19] cowplot_1.0.0 htmltools_0.5.0 httpuv_1.5.4
[22] Matrix_1.2-18 plyr_1.8.6 pkgconfig_2.0.3
[25] listenv_0.8.0 purrr_0.3.4 xtable_1.8-4
[28] patchwork_1.0.1 scales_1.1.1 RANN_2.6.1
[31] tensor_1.5 later_1.1.0.1 Rtsne_0.15
[34] spatstat.utils_1.17-0 tibble_3.0.3 mgcv_1.8-31
[37] generics_0.0.2 ggplot2_3.3.2 ellipsis_0.3.1
[40] ROCR_1.0-11 pbapply_1.4-3 lazyeval_0.2.2
[43] deldir_0.1-28 survival_3.1-12 magrittr_1.5
[46] crayon_1.3.4 mime_0.9 future_1.18.0
[49] nlme_3.1-147 MASS_7.3-51.6 ica_1.0-2
[52] tools_4.0.2 fitdistrplus_1.1-1 data.table_1.13.0
[55] lifecycle_0.2.0 stringr_1.4.0 plotly_4.9.2.1
[58] munsell_0.5.0 cluster_2.1.0 irlba_2.3.3
[61] compiler_4.0.2 rsvd_1.0.3 rlang_0.4.7
[64] grid_4.0.2 ggridges_0.5.2 goftest_1.2-2
[67] RcppAnnoy_0.0.16 rappdirs_0.3.1 htmlwidgets_1.5.1
[70] igraph_1.2.5 miniUI_0.1.1.1 gtable_0.3.0
[73] codetools_0.2-16 abind_1.4-5 reshape2_1.4.4
[76] R6_2.4.1 gridExtra_2.3 zoo_1.8-8
[79] dplyr_1.0.2 uwot_0.1.8 fastmap_1.0.1
[82] future.apply_1.6.0 KernSmooth_2.23-17 ape_5.4-1
[85] spatstat.data_1.4-3 stringi_1.4.6 spatstat_1.64-1
[88] parallel_4.0.2 Rcpp_1.0.5 rpart_4.1-15
[91] vctrs_0.3.2 sctransform_0.2.1 png_0.1-7
[94] tidyselect_1.1.0 lmtest_0.9-37
From home :
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Seurat_3.2.0
loaded via a namespace (and not attached):
[1] httr_1.4.2 tidyr_1.1.2 jsonlite_1.7.0 viridisLite_0.3.0
[5] splines_4.0.2 leiden_0.3.3 shiny_1.5.0 ggrepel_0.8.2
[9] globals_0.12.5 pillar_1.4.6 lattice_0.20-41 glue_1.4.2
[13] reticulate_1.16 digest_0.6.25 polyclip_1.10-0 RColorBrewer_1.1-2
[17] promises_1.1.1 colorspace_1.4-1 cowplot_1.0.0 htmltools_0.5.0
[21] httpuv_1.5.4 Matrix_1.2-18 plyr_1.8.6 pkgconfig_2.0.3
[25] listenv_0.8.0 purrr_0.3.4 xtable_1.8-4 patchwork_1.0.1
[29] scales_1.1.1 RANN_2.6.1 tensor_1.5 later_1.1.0.1
[33] Rtsne_0.15 spatstat.utils_1.17-0 tibble_3.0.3 mgcv_1.8-31
[37] generics_0.0.2 ggplot2_3.3.2 ellipsis_0.3.1 ROCR_1.0-11
[41] pbapply_1.4-3 lazyeval_0.2.2 deldir_0.1-28 survival_3.1-12
[45] magrittr_1.5 crayon_1.3.4 mime_0.9 future_1.18.0
[49] nlme_3.1-148 MASS_7.3-51.6 ica_1.0-2 tools_4.0.2
[53] fitdistrplus_1.1-1 data.table_1.13.0 lifecycle_0.2.0 stringr_1.4.0
[57] plotly_4.9.2.1 munsell_0.5.0 cluster_2.1.0 irlba_2.3.3
[61] compiler_4.0.2 rsvd_1.0.3 rlang_0.4.7 grid_4.0.2
[65] ggridges_0.5.2 rstudioapi_0.11 goftest_1.2-2 RcppAnnoy_0.0.16
[69] rappdirs_0.3.1 htmlwidgets_1.5.1 igraph_1.2.5 miniUI_0.1.1.1
[73] gtable_0.3.0 codetools_0.2-16 abind_1.4-5 reshape2_1.4.4
[77] R6_2.4.1 gridExtra_2.3 zoo_1.8-8 dplyr_1.0.2
[81] uwot_0.1.8 fastmap_1.0.1 future.apply_1.6.0 KernSmooth_2.23-17
[85] ape_5.4-1 spatstat.data_1.4-3 stringi_1.4.6 spatstat_1.64-1
[89] parallel_4.0.2 Rcpp_1.0.5 rpart_4.1-15 vctrs_0.3.2
[93] sctransform_0.2.1 png_0.1-7 tidyselect_1.1.0 lmtest_0.9-37
Thank you, really interesting thinking ! I dug a bit more into SCTransform and seems like the HVGs are not always the same, see below.
Warnings are the same in both cases, the same line is repeated except for warning 31 :
Checking towards SCtransform
I have 7 different samples in SCRNARR.list
I only plot UMAPs for sample from 1 to 4 and the screenshot in the main thread is coming from the sample 3...
I finished the analysis for both LAB and HOME objects after SCTransform on my office workstation.
For the sample 3, LAB object outputted Africa-shaped cluster in the good way and for the HOME object I got Africa-shaped cluster in reverse
How can the SCTransform function choose different HVGs from the exact same input objects, parameters ?
Hmm, good question. Are results the same if you run several times on the same workstation, maybe test on a single sample. Just
mclapply
it 10 times, and see whether the function itself is fully deterministic.Yeah the objects are the same size every time at the lab and every time the same size at home but objects size are different from the lab compare to the ones I get from home.
I would check content, not size. Size is almost never a good indicator for anything: