So, I happen to have single cell data from BD Rhapsody technology. What I do is to load the raw count data, i.e. the count matrix, using the CreateSeuratObject()
function. At this point, using the function as.SingleCellExperiment()
I obtain an sce
object that I can use for downstream analysis.
Point is that when I look and the sce
object, it has two assays, one containing the counts and the other containing the logCounts.
I am wondering from where those logCounts are coming and if they are reliable and I can use them as they are? Is there a normalisation step somewhere into the two mentioned function that I am missing?
To give an idea of the code:
# remove cell indexes, i.e. first column, and transpose matrix with raw data
transposed <- t(rawD[, -1])
# set colnames using the cells indexes
colnames(transposed) <- rawD$Cell_Index
# set row names as gene names, i.e. colnames(rawD)
rownames(transposed) <- featNames
# create Seurat object
cMatrix <- CreateSeuratObject(counts=transposed)
# and transform it into a SingleCellExperiment
# https://satijalab.org/seurat/v3.1/conversion_vignette.html
abSce <- as.SingleCellExperiment(cMatrix)
Now abSce
contains the mysterious logCounts:
class: SingleCellExperiment
dim: 23159 2380
metadata(0):
assays(2): counts logcounts
rownames(23159): A1cf A26c3 ... n.TYgta3 n.TYgta8
rowData names(0):
colnames(2380): 656575 253547 ... 779118 760123
colData names(5): orig.ident nCount_RNA nFeature_RNA ident antiB
reducedDimNames(0):
altExpNames(0):
What is going on here?
EDIT:
I just found out that
all(assay(mergedSce, "logcounts") == assay(mergedSce, "counts"))
[1] TRUE
which basically means that the logcount assay is not an actual logcount.
Am I right? If yes, then why creating it?
For the actual LogNormalize function' in Seurat, it looks like it scales the data by cell then log-transforms it.
ok, I see. So I was not wrong and something was actually missing. So, at this point I can use the logcounts to perform other downstream analysis safely. Is that right?
Assuming you correctly run the
NormalizeData
step as described above then you can assume that it is indeed the normalize log counts as generated by Seurat in the 'logcounts' slot. Whether you can use these safely likely depends on what kind of downstream analysis you are doing. For instance, differential expression analysis would generally use the raw counts as input.yes, of course. I was thinking about something that actually use logcounts, such as cell type assignment with SingleR.
The
as.SingleCellExperiment()
function in Seurat has been screwy/incomplete for years. It sometimes won't transfer metadata fully and completely ignoresrowData
, so at times, you may want to consider building an SCE from scratch. You can also just normalize via thelogNormCounts
function inscuttle
.For SingleR, you can also just snag the counts matrix from your Seurat object and run on that.
thank you for confirming that the function is, at least, unclear in what it does. Indeed, there is nothing in the documentation (as far as I know) mentioning the need to normalise the logcounts that are returned by the function. One just assumes that they should be fine.