Question

scRNAseq: thoughts on merging datasets generating from different version of 10x cellranger?

0

Entering edit mode

17 months ago

simplitia ▴ 130

I'm using Harmony + Seurat to merge a series of datasets generated from cell ranger. Unfortunately, not all sets have available fastq I can download to align. Most of my sets I have sucessfully generated data matrices from cell ranger 7.1 however there are few that were prior generated with version 3.1.

My question is what are your thoughts about merging sets that were generating from different version? Should I just re-align the ones I have fastq for to the lowest denominator which is 3.1 ( I'm hestitant to do this since most of the data I have is for 7.1)? What do you think.

thanks in advance!

scRNAseq Harmony 10x Seurat • 1.2k views

ADD COMMENT • link updated 17 months ago by Rob 6.9k • written 17 months ago by simplitia ▴ 130

score 0 · Answer 1 · 2023-07-20

I would use the UMI matrix’s you currently have although they are generated from different version of cellranger.

My reasoning is the following, the high version of cellranger may have high sensitivity to catch more alignments, which give you more power to detect what is significant in the later pipeline, it is precious for the research. The lower version may have less power, but it is okay, combining still gives more power as sample size is getting larger.

score 0 · Answer 2 · 2023-07-20

Personally, I would be reticent to integrate this data. The reason is that there was a _major_ shift in policy from CellRanger < 7 to CellRanger >= 7 where, by default, intronic UMIs are now included by default in gene counts. This means, essentially, that across these versions of CellRanger, you are effectively _measuring different things_. From that perspective, it doesn't make much sense to me to integrate data from across such diverse versions, though it's probably OK to integrate across different versions where major policies on how things are counted didn't change.

As a self-promotion — if you have access to the raw data and would like to quickly reprocess it all so that it can be meaningfully integrated, you can give alevin-fry (paper) a shot — it's much faster than CellRanger and explicitly counts both spliced and unspliced molecules, but puts them into separate layers of the count matrix, so that you can decide in post-processing how you wish to deal with them (i.e. combine them if you want or keep them separate).