Question

10x scRNAseq samples with similar reads/cell but vastly differing UMI/cell

0

Entering edit mode

3.2 years ago

matt.a.bennett25890 ▴ 30

Hi all,

Been trying to get my had around some discrepancies between a couple of 10x datasets. Both were sequenced to a depth of ~500k reads total and 2-3k cells yielded but then give vastly different UMI/cell as final output from CellRanger (v3). Here are further details + screenshots from CellRanger, any possible explanations appreciated!

Dataset 1 is from an in vitro cell culture experiment, yielding 3.6k cells with roughly 127k reads/cell and what seems like a very large 40k UMIs/cell:

dataset1summary

Dataset 2 is whole tissue, yielding a lesser 2.2k cells and so a greater starting depth of 213k reads/cell. However this provides only 3k UMIs/cell - obviously much lower than in Dataset 1 despite the higher number of reads/cell:

dataset2summary

Not sure how to account for a discrepancy this large? Mapping stats seem largely similar although some increase in rates for dataset 1. To me it could imply either a much greater amount of PCR duplication in Dataset 2 or a lack of accounting for PCR duplication in Dataset 1 for some reason... No idea why though at this stage!

Thanks for any suggestions

10x scRNAseq UMI QC • 1.4k views

ADD COMMENT • link 3.2 years ago by matt.a.bennett25890 ▴ 30

0

Entering edit mode

Just a couple of thoughts:

PCR duplication is inherently accounted for through the UMI-based de-duplication (unless something went specifically wrong with the UMI sequencing, which I find hard to come up with a scenario that would lead to that), therefore, if PCR duplication was a factor, it should have impacted data set 2 more so than data set 1
the first data set looks much better than the second one, especially when you look at the knee plot where you see a very clear separation of a good number of droplets with a great variety of UMI for the first one, but not so much for the second one
40K UMIs per cell do sound high, and paired with the higher numbers of cells and genes, it would suggest that you have sequenced a great diversity of transcripts, but with relatively low expression? Not sure.

In any case, I'd simply process the data and try to see if there are more clues as to what the quantitative and qualitative differences are and what may have caused them. Kudos for delving into the specifics of the CellRanger summary, it's a great start!

ADD REPLY • link 3.2 years ago by Friederike 9.0k

0

Entering edit mode

Thanks for the suggestions!

I'm wondering whether initial sample quality may play a role given dataset2 comes from whole tissue which may have been harder to prepare cleanly (e.g. more RNase or some other contaminant or more time taken for the protocol to finish and RNA to degrade). Could mean less starting RNA prior to cDNA and so more PCR duplication?

Also noting the quite low mapping % to transcriptome in dataset2 which I suppose could be a factor. Maybe from background PCR amplification or fragmented cDNA. No idea why this wouldn't also reduce the genome % mapping too though!

ADD REPLY • link 3.2 years ago by matt.a.bennett25890 ▴ 30