Is someone familiar with the internals of CellRanger in terms of how it compares the cellular barcodes obtained from a cDNA library vs the one obtained from a feature barcode library? I am in the situation that I inherited a project where the preprocessing was done with CellRanger, outputting several thousands of "good" cells (so with matched CBs for both cDNA and feature barcode libraries), but I am completely unable to replicate this with indepdendent methods. I did a preprocessing with Alevin. This yielded literally no overlap between detected barcodes for cDNA and feature barcode libraries. I then tried aligning the CBs (the first 16bp of R1 of the feature barcode libraries) to the CBs that CellRanger returned resulting in < 1% mapping rate with bowtie2 even in lenient --very-fast
mode. I verfied with the same strategy that both the cDNA and feature barcode CBs align with > 95% to the 10X-provided 3M-whitelist, so that is not the issue, these are 10X libraries and all other QC is good. Still, no overlap between cDNA and feature barcodes.
Any comments on this? Rob, hope you see this, is there anything fundamentally different towards how Alevin and CR treat/identify CBs?
Hi Avi, thanks, I will go through it!
Oh wow, indeed that 3M barcode translation list you provide solved the issue. Thanks for taking the time!
For the general audience, here is the article at 10x explaining why that barcode translation was necessary:
https://kb.10xgenomics.com/hc/en-us/articles/360031133451-Why-is-there-a-discrepancy-in-the-3M-february-2018-txt-barcode-whitelist- which applies when using TotalSeqB or C.
Can I suggest that you mention this in the alevin and alevin-fry feature barcode tutorials. By best knowledge, I really did not find this information without your help, and I dare to say that my google-fu is relatively well-developed.
Yep I agree, I'll add a note to the feature barcoding tutorial, ATpoint thanks for bringing this up.