Question

Estimating sequencing depth or mean reads per cell

0

Entering edit mode

3.4 years ago

Lekshmy ▴ 10

I have been working with multiple single cell Rna datasets.Inorder to compare the sequencing dpeth of multiple sample,I am trying to find the mean reads per cell.I used the following code:

counts_per_cell <- Matrix::colSums(sample1)
Mean(counts_per_cell)

The value is coming out to be 1392 which is very small.As i am using a publicly available dataset,I dont really know the value to sequencing depth.Can anyone guide me here.How do i find sequencing depth from seurat object?

sc-rna • 2.1k views

ADD COMMENT • link updated 3.4 years ago by ATpoint 86k • written 3.4 years ago by Lekshmy ▴ 10

1

Entering edit mode

I guess you have to go back to the results from the aligner/quantifier (e.g. CellRanger) to know for sure. Thing is that you cannot use colSums on the count matrix as in the presence of UMIs many reads will be collapsed to a single UMI count so you are vastly underestimating things

ADD REPLY • link 3.4 years ago by ATpoint 86k

0

Entering edit mode

Thank you so much for helping me out!!

Is there anyway we can go back to raw fastq from the cell ranger output(barcodes.tsv,genes.tsv and matrix.mtx).

Also output of colSum actually gives me a list of number of reads for each cell corresponding to each barcode.Is it right to take the mean of all the output to give a approx estimate to mean reads per cell for a given sample?

ADD REPLY • link 3.4 years ago by Lekshmy ▴ 10

0

Entering edit mode

Is it right to take the mean of all the output to give a approx estimate to mean reads per cell for a given sample?

As I said, if you have UMIs then this is inappropriate as you basically could have a single read per UMI or 1000, no way to know. For non UMI data the colSums should probably represent the on-target reads indeed. It now also depends on how you define reads per cell. Given that your pipeline identified 5000 cells, and say you had 250mio reads sequenced. Would the metric you want then be 250mio/5000 so basically all reads divided by the cells, or would it be reads successfully mapped to the transcriptome? In any case I think you need for non-UMI data the colSums plus info of the mapping percentage, and for UMI data either the fastq files or some kind of summary from CellRanger or similar quantifier pipeline. Usually (from what I understand) the recommendations, e.g. from 10X is like a fixed number of raw reads per cell. There is lots of uncertainty in this due to mapping rate, number of truely detected cells versus expected cells based on library prep, and then the presence of UMis.

Others might have different ideas.

ADD REPLY • link 3.4 years ago by ATpoint 86k

0

Entering edit mode

It would also be great if you can suggest me some method i can use to compare sequencing depth of different samples

ADD REPLY • link 3.4 years ago by Lekshmy ▴ 10