Hi, I have data from a specific cell from mouse fed with a certain diet. I integrated 4 datasets that were measured at four different time for the integrated single-cell RNA seq analysis. I have been referring to the Seurat vignette : https://satijalab.org/seurat/v3.1/immune_alignment.html.
I am using SingleR to identify cell type for each cluster and I am wondering if I need to set DefaultAssay
as "RNA" or "integrated". I tried both, but they gave me slightly different results for cell type identification.
Should I keep DefaultAssay
as "RNA" or "integrated"?
Any thoughts and advice are greatly appreciated. Thank you.
Just to add => log2 expression values.
Good point. I was just concerned with "RNA" vs "integrated".
Sorry for the silly question, what do you mean by log2 expression value? take log2 of data slot under the RNA assay? or the counts slot under the RNA assay?
But we perform integrated assay when we trying to align cell states shared across datasets. If we will use 'RNA' slot what is the point of integration?
Integration aims to create a common clustering landscape in which all cells are embedded. This makes it easy to compare cells which (without integration) would cluster based on batch effects, biological differences such as cell cycle, all kinds of sample-specific differences. The integration values must not be used (to my knowledge) with differential analysis methods since the process creates dependencies and notably changes magnitude and direction of changes. Therefore it probably should not be used with classifiers such as singleR which operates on Spearman correlation and therefore would suffer from changes in magnitude and direction of counts.
To think of it another way: If you are trying to identify your cells based on independent datasets, it probably makes more sense to use the stable normalized values. The integrated values will change depending on the datasets you are integrating, which means your cell types will also change and that does not seem reasonable.
Under RNA assay, there are two slots, one is data, one is counts, which one should be used as SingleR input? I guess the counts slot is the raw counts, right? and the data slots are logNormalized data?
SingleR expects the normalized counts, so you want the
data
slot.