I was wondering if I can still use (trust?) the cite-seq-count
tools. The tool was not updated for a few years and I'm not sure, if it is still on the level of for example cellranger
.
I have tested the 5k_pbmc_protein_v3_nextgem
data set from 10x Genomics with CITE-Seq data for both gene expression and AB capture, running it with both cellranger (v. 7) as well as the latest CITE-Seq-count (V. 1.4.5). Bothe were used with the default parameters.
Unfortunatley, the results varies a lot. The differences are not just in one direction, some of the counts are higher in the one, some are higher in the other, sometimes even more then double the amount of reads per antibody.
Another factors I'm not really sure about, when looking at the two data sets are the number of cells returns from both runs. the filtered cellranger folder contains 5557 cells, while cite-seq have "only" 4499 cell, both with completely different barcodes
vec1 <- colnames(cs.mtx) # citeseq dgCMatrix
vec2 <- colnames(ab.cr) # cellranger dgCMatrix
vec2 <- gsub(pattern = "-1", replacement = "", x = vec2)
intersect(vec1, vec2)
character(0)
Any reason why the barcodes are completely different?
How can I find what filtering options are used to get the different count numbers?
thanks
Assa
Are you using the same whitelists for barcodes etc with non-10x software. Cellranger probably has those built in. If the other tool has not been updated recently why are you trying to use it for a comparison.
I was not using it for a comparison. I was using it as this tool was created exactly for that reason, to count features from AB-bound membrane proteins. That was before I realized that cellranger can do it as well.
After running cellranger I have used
citeseq-count
with a whitelist made from the barcodes listed in the filtered output folder of cellranger. This way i got the exact same number of cells andall the barcodes discovered also withcellranger
, but the numbers here are very low and completely different from the cellranger results.Where are the barcodes of
citeseq-counts
coming from?I thought they are read in from the reads.
I understand I should use cellranger, as it seems to be doing a better job, I just would like to understand better what is happening.
What's your
cite-seq-count
command line like? Did you preform any barcode and/or UMI correction on that one?As mentioned above I used the default parameters for
citeseq-count
.This is my command
The first time I ran without the whitelist and the second time, to make it comparable with
cellranger
I used the barcodes from thecellranger
output as whitelist.Without the whitelist, the counts were much better, but not comparable with the gene expression counts from cellranger, as they had different barcodes. the run with the whitelist show very low counts for the cellranger barcodes in the Antibody Capture matrix
5k_pbmc_protein_v3_nextgem
uses TotalSeq-B antibodies. If I remember correctly, the eighth and the ninth of the protein reads' barcodes are replaced by their complements in the library. Cell Ranger handles this automatically so that it can be integrated with RNA, but I doubtcite-seq-count
does that. Can you try again with your barcode whitelist modified in the way I described? For example, the barcode output in Cell Ranger asAAACCCATCCCTCTTT
should be modified toAAACCCAAGCCTCTTT
.In addition, is
-trim 10
really the default forcite-seq-count
? The current docs does not mention it being the default, nor does10xv3
libraries requires any kind of trimming if sequenced by an Illumina sequencer or equivalent.I recently got the same fastq files and I am now trying to process it. I am new to sc analysis and generally not very experienced with bioinformatic tools upstream of R. Could you please show what your cellranger command line is like?
10x genomics has examples of
cellranger
command lines available here: https://www.10xgenomics.com/support/software/cell-ranger/analysis/running-pipelines/cr-gex-countThere are different versions of cellranger for gene expression, single cell ATACseq etc. so be sure to use the correct one.
Please open a new question, describe precisely your data, what you tried and where yo got stuck.