Question

counts of cite-seq-count vs. cellrenager varies tremendously

0

Entering edit mode

20 months ago

Assa Yeroslaviz ★ 1.9k

I was wondering if I can still use (trust?) the cite-seq-count tools. The tool was not updated for a few years and I'm not sure, if it is still on the level of for example cellranger.

I have tested the 5k_pbmc_protein_v3_nextgem data set from 10x Genomics with CITE-Seq data for both gene expression and AB capture, running it with both cellranger (v. 7) as well as the latest CITE-Seq-count (V. 1.4.5). Bothe were used with the default parameters.

Unfortunatley, the results varies a lot. The differences are not just in one direction, some of the counts are higher in the one, some are higher in the other, sometimes even more then double the amount of reads per antibody.

Another factors I'm not really sure about, when looking at the two data sets are the number of cells returns from both runs. the filtered cellranger folder contains 5557 cells, while cite-seq have "only" 4499 cell, both with completely different barcodes

vec1 <- colnames(cs.mtx) # citeseq dgCMatrix
vec2 <- colnames(ab.cr)  # cellranger dgCMatrix
vec2 <- gsub(pattern = "-1", replacement = "", x = vec2)
intersect(vec1, vec2)

character(0)

Any reason why the barcodes are completely different?

How can I find what filtering options are used to get the different count numbers?

thanks

Assa

citeseq-count cellranger cite-seq scRNA-seq • 2.4k views

ADD COMMENT • link updated 15 months ago by ATpoint 87k • written 20 months ago by Assa Yeroslaviz ★ 1.9k

1

Entering edit mode

Are you using the same whitelists for barcodes etc with non-10x software. Cellranger probably has those built in. If the other tool has not been updated recently why are you trying to use it for a comparison.

ADD REPLY • link 20 months ago by GenoMax 150k

0

Entering edit mode

I was not using it for a comparison. I was using it as this tool was created exactly for that reason, to count features from AB-bound membrane proteins. That was before I realized that cellranger can do it as well.

After running cellranger I have used citeseq-count with a whitelist made from the barcodes listed in the filtered output folder of cellranger. This way i got the exact same number of cells andall the barcodes discovered also with cellranger, but the numbers here are very low and completely different from the cellranger results.

Where are the barcodes of citeseq-counts coming from?

I thought they are read in from the reads.

I understand I should use cellranger, as it seems to be doing a better job, I just would like to understand better what is happening.

ADD REPLY • link 20 months ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

What's your cite-seq-count command line like? Did you preform any barcode and/or UMI correction on that one?

ADD REPLY • link 20 months ago by John Ma ▴ 310

0

Entering edit mode

As mentioned above I used the default parameters for citeseq-count.

This is my command

CITE-seq-Count -T 20 -trim 10 \
-wl cellranger_barcodes_2_whitelist.txt \
-R1 5k_pbmc_protein_v3_nextgem_antibody_S2_L002_R1_001.fastq.gz \
-R2 5k_pbmc_protein_v3_nextgem_antibody_S2_L002_R2_001.fastq.gz \
-t citeSeq_feature.csv \
-cbf 1 -cbl 16 -umif 17 -umil 28 -cells 5500 -o citeSeq_out/

The first time I ran without the whitelist and the second time, to make it comparable with cellranger I used the barcodes from the cellranger output as whitelist.

Without the whitelist, the counts were much better, but not comparable with the gene expression counts from cellranger, as they had different barcodes. the run with the whitelist show very low counts for the cellranger barcodes in the Antibody Capture matrix

ADD REPLY • link 20 months ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

5k_pbmc_protein_v3_nextgem uses TotalSeq-B antibodies. If I remember correctly, the eighth and the ninth of the protein reads' barcodes are replaced by their complements in the library. Cell Ranger handles this automatically so that it can be integrated with RNA, but I doubt cite-seq-count does that. Can you try again with your barcode whitelist modified in the way I described? For example, the barcode output in Cell Ranger as AAACCCATCCCTCTTT should be modified to AAACCCAAGCCTCTTT.

In addition, is -trim 10 really the default for cite-seq-count? The current docs does not mention it being the default, nor does 10xv3 libraries requires any kind of trimming if sequenced by an Illumina sequencer or equivalent.

ADD REPLY • link 20 months ago by John Ma ▴ 310

0

Entering edit mode

I recently got the same fastq files and I am now trying to process it. I am new to sc analysis and generally not very experienced with bioinformatic tools upstream of R. Could you please show what your cellranger command line is like?

ADD REPLY • link 15 months ago by Yonghui • 0

0

Entering edit mode

10x genomics has examples of cellranger command lines available here: https://www.10xgenomics.com/support/software/cell-ranger/analysis/running-pipelines/cr-gex-count

There are different versions of cellranger for gene expression, single cell ATACseq etc. so be sure to use the correct one.

ADD REPLY • link 15 months ago by GenoMax 150k

0

Entering edit mode

Please open a new question, describe precisely your data, what you tried and where yo got stuck.

ADD REPLY • link 15 months ago by ATpoint 87k