How to get parameter for cluster in scCATCH tool
1
0
Entering edit mode
2.2 years ago
Chris ▴ 340

Hi all,

I would like to annotate cell type from scRNA-seq data, so I follow the tutorial below:

https://cran.r-project.org/web/packages/scCATCH/vignettes/tutorial.html
obj <- createscCATCH(data = mouse_kidney_203, cluster = mouse_kidney_203_cluster)

However, in the code above, I don't know how to get the input for cluster in my data. Mouse_kidney_203_cluster already exists when we load the data.

Thank you so much!

scCATCH • 2.6k views
ADD COMMENT
2
Entering edit mode
2.2 years ago

Hi,

So in the case you mentioned mouse_kidney_203 represents an object of class dgCMatrix, i.e., sparse matrix of gene expression counts log-normalized where columns represent cells and rows genes, and mouse_kidney_203_cluster is a character vector with cluster annotations of the same size of the number of columns in the matrix, i.e., matching the cell column names in the gene expression matrix. These are the data that you need to provide.

Which type of data do you have? Seurat or SingleCellExperiment object?

Depending on the software you use to obtain clusters, this information will be stored in different places. You just need to get it and provide that to the software as a character vector.

I hope this helps.

António

ADD COMMENT
0
Entering edit mode

Hi António,

Thank you so much for your answer! I have Seurat object. Because I filter out some cells after creating the Seurat object, so I got the error number of rows is not equal number of columns. Do you have any suggestions to fix this? It does not make sense if I don't filter out to keep number of rows equal number of columns. data in createscCATCH() does not accept large Seurat (filter out cells), it only accepts large dgCmatrix (have not filtered out cells).

ADD REPLY
1
Entering edit mode

I got the error number of rows is not equal number of columns.

Where did you get this error?

Without looking to your code or having a minimal reproducible example, I can't help you further, only speculate about it.

Imagining that you have a Seurat object called seu and the cluster annotations saved in the meta.data slot with the name seurat_clusters.

You can get the log normalized data (assuming that you run first the seu <- NormalizeData(seu)) matrix of class dgCMatrix (and assuming that your data is saved in the RNA assay) by doing:

counts.sparse <- seu@assays$RNA@data

If you prefer, you can use the function GetAssayData(seu, slot="data") (see the documentation for more examples).

This is the data that you need to pass to the 1st argument data of the function createscCATCH().

To get the cluster annotations you can try (assuming that your cluster annotations are saved in the meta.data slot of your Seurat object with the name seurat_clusters):

cluster_ids <- as.character(seu@meta.data$seurat_clusters)

This information can be given to the argument cluster in the function createscCATCH().

Therefore, you can attempt to run scCATCH by doing:

 obj <- createscCATCH(data = counts.sparse, cluster = cluster_ids)

Now, if you filtered out some cells from your Seurat object, I believe that you meta.data should match you data slot.

You can check this by doing:

all(colnames(seu@assays$RNA@data)==row.names(seu@meta.data))

The above line of code should return TRUE if cell labels/names/barcodes between normalized data and meta.data are exactly the same.

I hope this helps,

António

ADD REPLY
0
Entering edit mode

Thank António for your very detailed answer! I can fix the error. However, I got the new error:

Error: vector memory exhausted (limit reached?)
In addition: Warning message:
In asMethod(object) :
sparse->dense coercion: allocating vector of size 5.6 GiB

When I run:

my_obj <- findmarkergene(object = my_obj, species = "Human", marker = cellmatch, tissue = "Vein")

I got this error before but not sure of the solution. I tried this:

https://stackoverflow.com/questions/51295402/r-on-macos-error-vector-memory-exhausted-limit-reached

But it didn't work.

ADD REPLY
1
Entering edit mode

I'm glad this helped.

Regarding the new error, I would do the following:

(1) run the tutorial with the dummy/tutorial data to see if that works and to rule out any problem with installation, dependencies, etc.

(2) if running the analysis with your whole data does not work and you suspect of a memory issue, you may try to still run the software with a fraction of your data. Of course your aim is to run this for all your cells, but running this for a fraction would tell you if this is more about your data or really a memory issue as it seems to be. You can just subset your matrix and vector to the first 2000 or 5000 cells by doing:

obj <- createscCATCH(data = counts.sparse[,1:2000], cluster = cluster_ids[1:2000])

(3) check the stackoverflow post that you shared and attempt carefully what is recommended there (I'm sorry but I can't add anything to what is there - I never faced this exact error mostly because I work with medium size data sets in a cluster).

(4) if (3) does not work, try to get access to a computer with more RAM memory.

I hope this helps,

António

ADD REPLY
0
Entering edit mode

Hi António,

Thank you so much for your detailed answer!

There is no error with the tutorial data. Your solution worked!

I got a new error message when run:

my_obj <- findmarkergene(object = my_obj, species = "Human", marker = cellmatch, tissue = "Blood")

There are 1255 potential marker genes in CellMatch database for Human on Blood.
Error in $<-.data.frame(*tmp*, "cluster", value = character(0)) : replacement has 0 rows, data has 979

ADD REPLY
1
Entering edit mode

What is the result of (assuming that you are using this vector object to provide the clusters annotations and subsetting it to 2K first cells):

table(cluster_ids[1:2000])

Again, without knowing your data I can't say much, only speculate about it.

Is cluster_ids of class character?

class(cluster_ids)

António

ADD REPLY
0
Entering edit mode

cluster_ids is a factor with the number of levels equal number of clusters. I get this vector from column seurat_clusters. However, when I got the vector from the column condition_clusters (adding condition before cluster number), I got a character vector and it worked!

My data is from vein so tissue = 'blood' worked but tissue = 'vein' got error.

Error in .filter_marker(marker, species, cancer, tissue) : Vein, not matched with the tissue types in CellMatch database!

ADD REPLY
0
Entering edit mode

The tissue needs to match one of the tissues available in their database.

In the following link you can find all the tissues available ("vein" alone does not appear listed): https://github.com/ZJUFanLab/scCATCH/wiki/human_tissues

I hope this helps,

António

ADD REPLY

Login before adding your answer.

Traffic: 1915 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6