how is seurat computing logcounts?
1
0
Entering edit mode
3.6 years ago
tbg ▴ 120

So, I happen to have single cell data from BD Rhapsody technology. What I do is to load the raw count data, i.e. the count matrix, using the CreateSeuratObject() function. At this point, using the function as.SingleCellExperiment() I obtain an sce object that I can use for downstream analysis.

Point is that when I look and the sce object, it has two assays, one containing the counts and the other containing the logCounts.

I am wondering from where those logCounts are coming and if they are reliable and I can use them as they are? Is there a normalisation step somewhere into the two mentioned function that I am missing?

To give an idea of the code:

# remove cell indexes, i.e. first column, and transpose matrix with raw data
transposed <- t(rawD[, -1])

# set colnames using the cells indexes
colnames(transposed) <- rawD$Cell_Index
# set row names as gene names, i.e. colnames(rawD)
rownames(transposed) <- featNames

# create Seurat object
cMatrix <- CreateSeuratObject(counts=transposed)

# and transform it into a SingleCellExperiment
# https://satijalab.org/seurat/v3.1/conversion_vignette.html
abSce <- as.SingleCellExperiment(cMatrix)

Now abSce contains the mysterious logCounts:

class: SingleCellExperiment 
dim: 23159 2380 
metadata(0):
assays(2): counts logcounts
rownames(23159): A1cf A26c3 ... n.TYgta3 n.TYgta8
rowData names(0):
colnames(2380): 656575 253547 ... 779118 760123
colData names(5): orig.ident nCount_RNA nFeature_RNA ident antiB
reducedDimNames(0):
altExpNames(0):

What is going on here?

EDIT:

I just found out that

all(assay(mergedSce, "logcounts") == assay(mergedSce, "counts")) 
[1] TRUE

which basically means that the logcount assay is not an actual logcount.

Am I right? If yes, then why creating it?

singlecellexperiment single-cell seurat • 3.6k views
ADD COMMENT
3
Entering edit mode
3.6 years ago

The reason is due to some strange behavior in the conversion of the Seurat object to an SCE object. You need to run the standard normalization step in Seurat prior to conversion in order for the logcounts to be accurate.

seurat <- CreateSeurat(counts = counts)

Initially if you convert the seurat object the counts and logcounts are the same.

   test.sce <- as.SingleCellExperiment(seurat)
    all(assay(test.sce, "logcounts") == assay(test.sce, "counts"))

But if you include the extra normalization step then the logcounts ultimately change.

   seurat <- NormalizeData(object = seurat)
    test.sce <- as.SingleCellExperiment(seurat)
    all(assay(test.sce, "logcounts") == assay(test.sce, "counts"))

Here is some output:

> seurat <- CreateSeuratObject(counts = counts)
> test.sce <- as.SingleCellExperiment(seurat)
> all(assay(test.sce, "logcounts") == assay(test.sce, "counts"))
[1] TRUE
> seurat <- NormalizeData(object = seurat)
Performing log-normalization
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> head(assay(test.sce,"logcounts"))[,1:10]
6 x 10 sparse Matrix of class "dgCMatrix"
     [[ suppressing 10 column names ...]]

Xkr4     . . . . .  .   . . . .
Gm37381  . . . . .  .   . . . .
Rp1      . . . . .  .   . . . .
Sox17    . . . . .  .   . . . .
Gm37323  . . . . .  .   . . . .
Mrpl15  26 . . . . 65 223 . . .
> test.sce <- as.SingleCellExperiment(seurat)
> all(assay(test.sce, "logcounts") == assay(test.sce, "counts"))
[1] FALSE
> head(assay(test.sce,"logcounts"))[,1:10]
6 x 10 sparse Matrix of class "dgCMatrix"
   [[ suppressing 10 column names ...]]

Xkr4    .         . . . . .         .        . . .
Gm37381 .         . . . . .         .        . . .
Rp1     .         . . . . .         .        . . .
Sox17   .         . . . . .         .        . . .
Gm37323 .         . . . . .         .        . . .
Mrpl15  0.3981772 . . . . 0.8011147 1.378488 . . .
ADD COMMENT
0
Entering edit mode

For the actual LogNormalize function' in Seurat, it looks like it scales the data by cell then log-transforms it.

ADD REPLY
0
Entering edit mode

ok, I see. So I was not wrong and something was actually missing. So, at this point I can use the logcounts to perform other downstream analysis safely. Is that right?

ADD REPLY
1
Entering edit mode

Assuming you correctly run the NormalizeData step as described above then you can assume that it is indeed the normalize log counts as generated by Seurat in the 'logcounts' slot. Whether you can use these safely likely depends on what kind of downstream analysis you are doing. For instance, differential expression analysis would generally use the raw counts as input.

ADD REPLY
0
Entering edit mode

yes, of course. I was thinking about something that actually use logcounts, such as cell type assignment with SingleR.

ADD REPLY
1
Entering edit mode

The as.SingleCellExperiment() function in Seurat has been screwy/incomplete for years. It sometimes won't transfer metadata fully and completely ignores rowData, so at times, you may want to consider building an SCE from scratch. You can also just normalize via the logNormCounts function in scuttle.

For SingleR, you can also just snag the counts matrix from your Seurat object and run on that.

ADD REPLY
0
Entering edit mode

thank you for confirming that the function is, at least, unclear in what it does. Indeed, there is nothing in the documentation (as far as I know) mentioning the need to normalise the logcounts that are returned by the function. One just assumes that they should be fine.

ADD REPLY

Login before adding your answer.

Traffic: 2775 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6