Entering edit mode
3.3 years ago
Lekshmy
▴
10
I have been working with multiple single cell Rna datasets.Inorder to compare the sequencing dpeth of multiple sample,I am trying to find the mean reads per cell.I used the following code:
counts_per_cell <- Matrix::colSums(sample1)
Mean(counts_per_cell)
The value is coming out to be 1392 which is very small.As i am using a publicly available dataset,I dont really know the value to sequencing depth.Can anyone guide me here.How do i find sequencing depth from seurat object?
I guess you have to go back to the results from the aligner/quantifier (e.g. CellRanger) to know for sure. Thing is that you cannot use
colSums
on the count matrix as in the presence of UMIs many reads will be collapsed to a single UMI count so you are vastly underestimating thingsThank you so much for helping me out!!
Is there anyway we can go back to raw fastq from the cell ranger output(barcodes.tsv,genes.tsv and matrix.mtx).
Also output of colSum actually gives me a list of number of reads for each cell corresponding to each barcode.Is it right to take the mean of all the output to give a approx estimate to mean reads per cell for a given sample?
As I said, if you have UMIs then this is inappropriate as you basically could have a single read per UMI or 1000, no way to know. For non UMI data the colSums should probably represent the on-target reads indeed. It now also depends on how you define reads per cell. Given that your pipeline identified 5000 cells, and say you had 250mio reads sequenced. Would the metric you want then be 250mio/5000 so basically all reads divided by the cells, or would it be reads successfully mapped to the transcriptome? In any case I think you need for non-UMI data the colSums plus info of the mapping percentage, and for UMI data either the fastq files or some kind of summary from CellRanger or similar quantifier pipeline. Usually (from what I understand) the recommendations, e.g. from 10X is like a fixed number of raw reads per cell. There is lots of uncertainty in this due to mapping rate, number of truely detected cells versus expected cells based on library prep, and then the presence of UMis.
Others might have different ideas.
It would also be great if you can suggest me some method i can use to compare sequencing depth of different samples