Question

Single cell demultiplexing using 10x and TotalSeq-B

0

Entering edit mode

8 months ago

txema.heredia ▴ 190

Hi,

I have to analyze a single-cell dataset generated with 10x and using 3 TotalSeq-B antibodies to multiplex 3 biological samples into a single 10x run. The 10x processing and sequencing was performed externally, so I don't have access to all the details of what was exactly done.

I have limited experience with single cell analyses/Seurat and none using antibody tags. I am a bit lost on what I should expect at each step and how should I proceed.

I followed the documentation on 10x's website on how to use external antibody libraries to demultiplex samples: documentation

As stated in the documentation, I created a copy of all the antibody fastq ( *_FB_*.fastq.gz ) files removing the first read of each of them. I used these fastq files as the antibody sequence files and the original ones as the multiplexing reference. Then, I ran cellranger multi as follows:

Run command:

$ cat cellranger_multi_SSt_L1_4.sh 
#!/bin/bash
module load cellranger/7.2.0
sample=SSt_L1_4

cellranger multi --id=${sample} --csv=${sample}_multiConfig.csv --localcores=8

This is the multiconfig.csv file:

$ cat SSt_L1_4_multiConfig.csv 
[gene-expression]
reference,<path>/reference/cellranger_mm10_2020A/refdata-gex-mm10-2020-A/
cmo-set,<path>/run_multi/SSt_L1_4_MUX.csv

[feature]
reference,<path>/run_multi/SSt_L1_4_ANTIBODY.csv

[libraries]
fastq_id,fastqs,feature_types
SSt_L1_4_GEX,<path>/fastq/fastqs-gex,Gene Expression
SSt_L1_4_FB,<path>/fastq/fastqs-ab,Antibody Capture
SSt_L1_4_FB,<path>/fastq/fastqs-mux,Multiplexing Capture 

[samples]
sample_id,cmo_ids
sample1,sample1
sample2,sample2
sample3,sample3

This is the multiplexing tags/CMO file:

$ cat SSt_L1_4_MUX.csv 
id,name,read,pattern,sequence,feature_type
sample1,sample1,R2,5PNNNNNNNNNN(BC),GGTCGAGAGCATTCA,Multiplexing Capture
sample2,sample2,R2,5PNNNNNNNNNN(BC),CTTGCCGCATGTCAT,Multiplexing Capture
sample3,sample3,R2,5PNNNNNNNNNN(BC),AAAGCATTCTTCACG,Multiplexing Capture

And, finally, this is the antibody reference file:

$ cat SSt_L1_4_ANTIBODY.csv
id,name,read,pattern,sequence,feature_type
tsB0302,tsB0302,R2,5PNNNNNNNNNN(BC),GGTCGAGAGCATTCA,Antibody Capture
tsB0303,tsB0303,R2,5PNNNNNNNNNN(BC),CTTGCCGCATGTCAT,Antibody Capture
tsB0304,tsB0304,R2,5PNNNNNNNNNN(BC),AAAGCATTCTTCACG,Antibody Capture

After a first "successful" run, 154 out of 25,598 cells were assigned to sample2. 182 were Blank, and the remaining 25,262 were left Unassigned. No cells were assigned to either sample1 or sample3. Seeing this, I explored the assignment_confidence_table.csv file and this is what I saw:

Distribution of Assignment_Probability:

Histogram distribution of Assignment_Probability

Only cells with (max) Probability >= 0.9 (the default parameter) are assigned. All cells passing that threshold happen to be assigned to sample2. Blanks have a probability of 1. Most cells have a Probability around ~0.4. Lowering the probability threshold parameter will include extra cells, but it won't probably fix the bigger issues.

When looking at the probabilities of the individual samples separately, it is clear that something odd is happening. The probability distribution for sample2 seems ok-ish, with a wide range. However, for both sample1 and 3, their probability peaks at around ~0.4.

Next, I compared the probability of each sample vs the others:

probability sample X vs probability sample Y

Seeing this, it is obvious that something odd is going on. Higher probabilities of assigning a cell to sample2 correlates with lower probabilities of assigning them to samples 1 or 3. However, the probabilities to assign a cell to sample1 or sample3 are clearly correlated. That's probably why they cannot go beyond ~0.4.

Why could this be happening? Are these antibodies wrong? Is their sequence too similar?

When looking at the counts (log transformed) the picture is similar:

Summary of sampleX_cnts:

sample1_cnts
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.663   1.813   1.786   1.929   4.444 

sample2_cnts
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.342   1.505   1.569   1.778   4.721 

sample3_cnts
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.255   1.431   1.443   1.613   4.107

Histogram:

histogram counts all cells

Sample counts vs (max) probability:

sample counts vs max probability

Sample counts vs (sample's) probability:

sample counts vs sample probability

Surprisingly, only the counts for sample2 correlate nicely with the probability of assigning a cell to that sample.

Samples' 1&3 counts are not apparently correlated to their assigning probability. They are mostly a flat line with somewhat higher counts at higher probabilities, but nothing clear as in sample2.

Sample X counts vs sample Y counts:

counts vs counts

Comparing directly the log-transformed counts between samples doesn't show any clear pattern. In all cases there is a "shared blob" and a "sample-exclusive peak". Nothing in these plots screams a clear reason why the assignment probabilities of sample1 and sample3 are so strongly correlated, as the cells with highest counts values of either sample has more moderate counts for the other.

Can somebody enlighten me of what is going on here?

What is the expected result of using 3 separate antibodies on 3 separate samples? I also did a "naive" run of cellranger count and explored the resulting filtered_feature_bc_matrix.h5 (25k cells with >500 counts) file with Seurat and this is what I got:

counts antibody features

A number of cells have very high (raw) counts for a single antibody.

percentage antibody reads

However, when calculating the percentage of abundance of each of the 3 antibodies in each cell, most cells detect a mixture of all 3 antibodies. Some cells even have an almost 50%-50% split between two antibodies.

edit: Using stricter filters restricting the analysis to 15k cells with >1000 counts and >500 features show exactly the same:

percentage antibody reads with stricter filters

Is this the expected result of using TotalSeq-B, or did something go wrong in the lab? (this is my first time analyzing results using this technology, I don't know what to expect). Why are there so many reads of antibodies that shouldn't be present on all cells/samples?

Are the results of TotalSeq "normal" and I am just using cellranger multi wrong? Do you see where are my parameters wrong? Why are the antibodies for samples 1&3 detected at such similar rates? Is there a typo in the antibody sequence?

Is cellranger multi the right tool for this job or should I just use cellranger count and use Seurat's functions like HTODemux?

Thanks in advance

edit: I have added context for the .h5 file used, updated the 2 associated antibody counts figures to reflect the number of cells used, and added a new figure using a stricter cell filtering.

demultiplexing cellranger antibodies single-cell • 1.1k views

ADD COMMENT • link 8 months ago by txema.heredia ▴ 190

0

Entering edit mode

Can you put a threshold on total cell count before the last plots? I suspect the "middle" of the simplex is poisoned by background barcodes with few UMI.

ADD REPLY • link 8 months ago by LChart 4.5k

0

Entering edit mode

Hi, the last plots were generated using the filtered_feature_bc_matrix.h5 file. It contains 25k cells with >500 counts. I have updated the post and figures to clearly reflect the number of cells involved.

I have also generated a new figure using a stricter filter ( >1000 counts & >500 features) with 15k cells and the resulting figure is basically the same: Good/decent quality cells have mixed proportions of all 3 antibodies. Is this TotalSeq-B intended result?

ADD REPLY • link 8 months ago by txema.heredia ▴ 190

score 0 · Answer 1 · 2024-03-14

Definitely reach out to 10X Genomics for support, they're very helpful. One of the problems I ran into before is that the feature barcode sequence has to be the reverse complement than what the vendor gives to be properly demuxed. I'm not too familiar with CMO/cell hashing so I skipped through most of your plots, so apologies if that's been discussed.