Dear community,
I want to use ConsensusClusterPlus R package to estimate the number of clusters for a multi-parameter cytometry data set (55,000 cells x 46 markers). The problem is, that running the ConsensusClusterPlus function takes ages and after 1-2 hours the error message "Memory limit reached" appears.
I confirmed that I can run the data set from the vignette tutorial (microarray data, 128 samples x 5000 genes), which takes only a few seconds. For testing purposes I downscaled my dataset from 55,000 cells x 46 markers to (1) 5,000 cells x 46 markers and (2) to 128 cells x 46 markers. Only the last version with 128 cells x 46 markers finishes within a reasonable time and without error message.
But I doubt that it isn't possible to cluster more than 128 single cells, as I've seen ConsensusClusterPlus has been used for clustering scRNAseq data with much more than 128 cells.
This is my code:
results <- ConsensusClusterPlus(data,maxK=6,reps=50,pItem=0.8,pFeature=1, title=title,clusterAlg="hc",distance="pearson",seed=1262118388.71279,plot="png")
I was actually hoping to use my full data set for clustering (55,000 cells x 46 markers) and increase maxK to 20 and possibly should also increase reps (1000 as recommended by the authors). But those settings would be even more resource hungry. I've tried it with a laptop (intel core i7 4x2.5GHz, 16GB RAM) as well as a desktop computer (intel quadcore i5 4x 3.2GHz, 32GB RAM).
I would be grateful for any advice. Thanks in advance!!
1) are you running this in windows, mac or Linux? 2) do you have a 64bits OS?
macOS Catalina 10.15.3 64-bit on my Laptop, and Windows 10 64-bit on my Desktop PC.
seems like you need a bigger machine, also I am not sure if your machine is restricting memory