RaceID3 using 10x datasets
1
0
Entering edit mode
4.9 years ago
Seigfried ▴ 80

Hello I wish to cluster my single cell 10x data using RaceID3. However, I cannot load my 10x data into RaceID using their function SCseq

10x gave me 3 files: 1) barcodes.tsv.gz 2) features.tsv.gz 3) matrix.mtx.gz

I used Seurat's Read10X function :

library(Seurat)
library(RaceID)

pbmc.data <- Read10X(data.dir = "C:/Users/s/Downloads/")

sc <- SCseq(pbmc.data)

Here is my pbmc.data

> pbmc.data
33694 x 27179520 sparse Matrix of class "dgCMatrix"

This is the error i get :

sc <- SCseq(pbmc.data)
Error in asMethod(object) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105

I also tried using the Matrix package in R.

library(Matrix)
matrix_dir = "C:/Users/s/Downloads/"
barcode.path <- paste0(matrix_dir, "barcodes.tsv.gz")
features.path <- paste0(matrix_dir, "features.tsv.gz")
matrix.path <- paste0(matrix_dir, "matrix.mtx.gz")
mat <- readMM(file = matrix.path)
feature.names = read.delim(features.path, 
                       header = FALSE,
                       stringsAsFactors = FALSE)
barcode.names = read.delim(barcode.path, 
                       header = FALSE,
                       stringsAsFactors = FALSE)
colnames(mat) = barcode.names$V1
rownames(mat) = feature.names$V1

And it fails to allocate a huge amount of memory

> sc <- SCseq(mat)
Error: cannot allocate vector of size 6823.1 Gb

I understand that RaceID requires a sparse matrix which I am already providing. Can someone please explain?

RaceID single cell 10x • 2.3k views
ADD COMMENT
1
Entering edit mode
4.9 years ago

RaceID is requesting about 7TB RAM to load that dataset, which is pretty much guaranteed to be more than you have. I can tell you from experience that RaceID3 does not currently scale well with 10x-scale data, so in addition to needing a absurd RAM amounts it'll need a LOT of time to run. I recommend switching to something else for this kind of data.

ADD COMMENT
0
Entering edit mode

Thank you for your reply @Devon Ryan

The count matrix I am using is currently a "cellranger aggregate" of 4 different samples. I tried using Seurat for clustering but since my samples are cell culture samples with differing conditions, they do not cluster well.

I could run this on a single sample but then it would defeat the purpose of identifying cell lineages.

Could you please recommend any other tools I can use to effectively do this? Currently trying out Slingshot.

Wishing you a Happy New Year and Decade!

ADD REPLY
0
Entering edit mode

Play with the parameters in Seurat more, including how you're dealing with batches (i.e., samples). You can also try things like scanorama and scanpy.

ADD REPLY

Login before adding your answer.

Traffic: 2780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6