Hi All,
Newbie here - I am currently running Seurat on an RStudio server that has 3TB of RAM, 4 Intel Xeon CPUs with 24 cores. I am running 53 samples.
When I run the IntegrateData step, I keep receiving the following error:
Integrating data
Merging dataset 51 53 49 52 50 45 47 46 48 44 54 42 11 36 26 25 10 33 14 12 24 into 3 4 23 21 41 29 1 30 9 20 13
15 2 43 5 34 18 17 31 35 19 7 32 28 6 22 27 8 16 58 61 56 57 59
Extracting anchors for merged samples
Finding integration vectors
Error in subCsp_rows(x, i, drop = drop) :
Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 89
I am currently running the following:
Immune.features <- SelectIntegrationFeatures(object.list = sample_all_v2, nfeatures = 3000)
options(future.globals.maxSize = 4800 * 1024^2)
Immune.list <- PrepSCTIntegration(object.list = sample_all_v2, anchor.features = Immune.features,
verbose = TRUE)
pbmc.anchors <- FindIntegrationAnchors(object.list =Immune.list, normalization.method = "SCT",
anchor.features = Immune.features)
pbmc.integrated <- IntegrateData(anchorset = pbmc.anchors, normalization.method = "SCT")
According to this issue from the Seurat github, downsampling is recommended - which I performed as follows:
pbmc <- subset(pbmc, subset = nFeature_RNA > 200)
pbmc.list <- SplitObject(pbmc, split.by = "Method")
for (i in names(pbmc.list)) {
pbmc.list[[i]] <- SCTransform(pbmc.list[[i]], verbose = TRUE)
}
pbmc.features <- SelectIntegrationFeatures(object.list = pbmc.list, nfeatures = 3000)
pbmc.list <- PrepSCTIntegration(object.list = pbmc.list, anchor.features = pbmc.features)
table(Idents(pbmc.list[[9]])) #pbmc2 3327
#Downsampling
pbmc.list_v2 <- lapply(X = pbmc.list,
FUN = subset,
downsample = 1000)
table(Idents(pbmc.list_v2[[8]])) #pbmc1 1000 pbmc2 1000
#Downsampled
sample_all_v2 <- lapply(X = sample_all,
FUN = subset,
downsample = 1000)
table(Idents(sample_all_v2[[53]])) #pbmc1 253 #pbmc2 273
However I still am getting the aforementioned error. Is there any way to solve this issue?
Thank you!
Thank you for the recommendation, fracar8! Is there a way to check what samples the reference is alluding to from the list when it specifies, for example, "reference = c(1,2)"? Thanks
You are storing the samples inside a list.
c(1,2)
means the first and the second elements of the list. What I usually do is to add the sample name to the list and then pass the reference asc("healthy-control2","control1","control2")
Got it, thanks! For some reason I had a lapse in thinking and forgot it was referencing the samples in the order I placed them. Appreciate the help and patience!