Hi~ I'm currently using 10x cellranger to analyse single cell RNA-seq data. According to their algorithm, reads mapping confidently to more than one exons will be discarded. However, there are paralogous genes in the genome that are largely identical and all the reads for such genes are discarded. Therefore, I was wondering if there is a way to change the algorithm to count the first (or a random) confident alignment. Unfortunately, I wasn't able to locate the file containing the algorithm. Any hints would be appreciated. Thanks every one!
There's another option now, if you do not want to use pseudo-alignment algorithms used by
salmon
/alevin
andkallisto
. STAR has a workflow namesSTARsolo
that allows you to get results that correlate withcellRanger
very well, but correctly account for multimappers.Is there any documentation available what STARsolo does with multimappers? The manual does not seem to mention it specifically towards STARsolo.
As far as I am aware, neither Cell Ranger, nor STARSolo "handle" gene ambiguous reads. Such reads are discarded by those pipelines, as the UMI resolution algorithm assumes related UMIs --- UMIs that will be deduplicated --- align to the same gene.
Ok, perhaps I am confused - is STARsolo and STARsolo-Quant the same thing?
I've looked through this presentation and it describes what sounds like a typical EM-based approach that can account for multimappers, similar to what rsem/kallisto/salmon are using:
https://f1000research.com/slides/8-1897
STARsolo is not STARsolo-Quant. STARsolo is the single-cell mode of STAR that is actively developed, maintained, and improved. It is usable today as a (much more efficient) and near drop-in replacement of Cell Ranger. It uses a UMI resolution algorithm specifically designed to be very similar to the one used in Cell Ranger. STARsolo-Quant, on the other hand, is a protocol discussed in the slides you link and about which there was a talk at Genome Informatics in 2019. It was / is a research project, but I don't know of any official documentation on how to run or use the protocol. Also, it works differently than STARsolo itself (or alevin, Cell Ranger, or kallisto), in that all available documentation suggests that it performs multimapping resolution (a) at the transcript-level and (b) only at the cluster level. On the other hand, alevin performs gene-level multi-mapping resolution, but does so at the cell-level. The other methods (including STARsolo), discard gene multi-mapping UMIs at the cell level, and so they are not considered in the gene x cell count matrix that results as output of those tools.
Thank you, this is very helpful.