Hi all,
Been trying to get my had around some discrepancies between a couple of 10x datasets. Both were sequenced to a depth of ~500k reads total and 2-3k cells yielded but then give vastly different UMI/cell as final output from CellRanger (v3). Here are further details + screenshots from CellRanger, any possible explanations appreciated!
Dataset 1 is from an in vitro cell culture experiment, yielding 3.6k cells with roughly 127k reads/cell and what seems like a very large 40k UMIs/cell:
Dataset 2 is whole tissue, yielding a lesser 2.2k cells and so a greater starting depth of 213k reads/cell. However this provides only 3k UMIs/cell - obviously much lower than in Dataset 1 despite the higher number of reads/cell:
Not sure how to account for a discrepancy this large? Mapping stats seem largely similar although some increase in rates for dataset 1. To me it could imply either a much greater amount of PCR duplication in Dataset 2 or a lack of accounting for PCR duplication in Dataset 1 for some reason... No idea why though at this stage!
Thanks for any suggestions
Just a couple of thoughts:
In any case, I'd simply process the data and try to see if there are more clues as to what the quantitative and qualitative differences are and what may have caused them. Kudos for delving into the specifics of the CellRanger summary, it's a great start!
Thanks for the suggestions!
I'm wondering whether initial sample quality may play a role given dataset2 comes from whole tissue which may have been harder to prepare cleanly (e.g. more RNase or some other contaminant or more time taken for the protocol to finish and RNA to degrade). Could mean less starting RNA prior to cDNA and so more PCR duplication?
Also noting the quite low mapping % to transcriptome in dataset2 which I suppose could be a factor. Maybe from background PCR amplification or fragmented cDNA. No idea why this wouldn't also reduce the genome % mapping too though!