Entering edit mode
21 months ago
bompipi95
▴
170
I am reading a publication which has sequenced the 10x scRNA-seq library to around 50,000-100,000 reads / cell. When I performed QC and checked the nCount_RNA metric, the median nCount_RNA across all samples is around 3600.
Since nCount_RNA is the number of detected molecules (UMIs)/ cell, I presume the large difference in the number of reads / cell vs the number of detected molecules / cell, is that a lot of the reads are PCR duplicated (share the same UMIs). Would this be a correct interpretation of the discrepancy? I also presume that these numbers are typical of an scRNA-seq library.
This is typical lab-rat accounting (sorry lab-rats!). What they are saying is that they estimated (say) 50,000 cells had been processed on the Chromium chip based on the 10x manufacturer and the number of lanes that didn't clog. They sequenced to a depth of 5 billion reads, which would be 100,000 reads per cell (on average).
However.
1) Not all i5 and i7 barcodes will be read or match a proper barcode
2) Not all RNA reads will come from the cells of interest (adapters, contamination, other junk)
3) Not all RNA reads will come from a droplet with a cell (leaky cDNA creating background barcodes)
4) Not all RNA reads will align
5) Not all RNA reads will be unique (molecular duplicates)
6) Not all cells will generate an analyzable amount (say >= 500) of UMI
The ultimate efficiency (% of reads belonging to an analyzed cell) depends on a number of factors -- not the least of which was the RNA integrity to start with -- and can wind up being as low as 15%.
IMO, to the epigram "there are Lies, Damned Lies, and Statistics" we should append "input-normalized sequencing depth".
Perfect.. thank you!