Hi,
I have to analyze a single-cell dataset generated with 10x and using 3 TotalSeq-B antibodies to multiplex 3 biological samples into a single 10x run. The 10x processing and sequencing was performed externally, so I don't have access to all the details of what was exactly done.
I have limited experience with single cell analyses/Seurat and none using antibody tags. I am a bit lost on what I should expect at each step and how should I proceed.
I followed the documentation on 10x's website on how to use external antibody libraries to demultiplex samples: documentation
As stated in the documentation, I created a copy of all the antibody fastq ( *_FB_*.fastq.gz
) files removing the first read of each of them. I used these fastq files as the antibody sequence files and the original ones as the multiplexing reference. Then, I ran cellranger multi
as follows:
Run command:
$ cat cellranger_multi_SSt_L1_4.sh
#!/bin/bash
module load cellranger/7.2.0
sample=SSt_L1_4
cellranger multi --id=${sample} --csv=${sample}_multiConfig.csv --localcores=8
This is the multiconfig.csv
file:
$ cat SSt_L1_4_multiConfig.csv
[gene-expression]
reference,<path>/reference/cellranger_mm10_2020A/refdata-gex-mm10-2020-A/
cmo-set,<path>/run_multi/SSt_L1_4_MUX.csv
[feature]
reference,<path>/run_multi/SSt_L1_4_ANTIBODY.csv
[libraries]
fastq_id,fastqs,feature_types
SSt_L1_4_GEX,<path>/fastq/fastqs-gex,Gene Expression
SSt_L1_4_FB,<path>/fastq/fastqs-ab,Antibody Capture
SSt_L1_4_FB,<path>/fastq/fastqs-mux,Multiplexing Capture
[samples]
sample_id,cmo_ids
sample1,sample1
sample2,sample2
sample3,sample3
This is the multiplexing tags/CMO file:
$ cat SSt_L1_4_MUX.csv
id,name,read,pattern,sequence,feature_type
sample1,sample1,R2,5PNNNNNNNNNN(BC),GGTCGAGAGCATTCA,Multiplexing Capture
sample2,sample2,R2,5PNNNNNNNNNN(BC),CTTGCCGCATGTCAT,Multiplexing Capture
sample3,sample3,R2,5PNNNNNNNNNN(BC),AAAGCATTCTTCACG,Multiplexing Capture
And, finally, this is the antibody reference file:
$ cat SSt_L1_4_ANTIBODY.csv
id,name,read,pattern,sequence,feature_type
tsB0302,tsB0302,R2,5PNNNNNNNNNN(BC),GGTCGAGAGCATTCA,Antibody Capture
tsB0303,tsB0303,R2,5PNNNNNNNNNN(BC),CTTGCCGCATGTCAT,Antibody Capture
tsB0304,tsB0304,R2,5PNNNNNNNNNN(BC),AAAGCATTCTTCACG,Antibody Capture
After a first "successful" run, 154 out of 25,598 cells were assigned to sample2. 182 were Blank, and the remaining 25,262 were left Unassigned. No cells were assigned to either sample1 or sample3. Seeing this, I explored the assignment_confidence_table.csv
file and this is what I saw:
Distribution of Assignment_Probability:
Only cells with (max) Probability >= 0.9 (the default parameter) are assigned. All cells passing that threshold happen to be assigned to sample2. Blanks have a probability of 1. Most cells have a Probability around ~0.4. Lowering the probability threshold parameter will include extra cells, but it won't probably fix the bigger issues.
When looking at the probabilities of the individual samples separately, it is clear that something odd is happening. The probability distribution for sample2 seems ok-ish, with a wide range. However, for both sample1 and 3, their probability peaks at around ~0.4.
Next, I compared the probability of each sample vs the others:
Seeing this, it is obvious that something odd is going on. Higher probabilities of assigning a cell to sample2 correlates with lower probabilities of assigning them to samples 1 or 3. However, the probabilities to assign a cell to sample1 or sample3 are clearly correlated. That's probably why they cannot go beyond ~0.4.
Why could this be happening? Are these antibodies wrong? Is their sequence too similar?
When looking at the counts (log transformed) the picture is similar:
Summary of sampleX_cnts
:
sample1_cnts
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.663 1.813 1.786 1.929 4.444
sample2_cnts
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.342 1.505 1.569 1.778 4.721
sample3_cnts
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.255 1.431 1.443 1.613 4.107
Histogram:
Sample counts vs (max) probability:
Sample counts vs (sample's) probability:
Surprisingly, only the counts for sample2 correlate nicely with the probability of assigning a cell to that sample.
Samples' 1&3 counts are not apparently correlated to their assigning probability. They are mostly a flat line with somewhat higher counts at higher probabilities, but nothing clear as in sample2.
Sample X counts vs sample Y counts:
Comparing directly the log-transformed counts between samples doesn't show any clear pattern. In all cases there is a "shared blob" and a "sample-exclusive peak". Nothing in these plots screams a clear reason why the assignment probabilities of sample1 and sample3 are so strongly correlated, as the cells with highest counts values of either sample has more moderate counts for the other.
Can somebody enlighten me of what is going on here?
What is the expected result of using 3 separate antibodies on 3 separate samples? I also did a "naive" run of cellranger count
and explored the resulting filtered_feature_bc_matrix.h5
(25k cells with >500 counts) file with Seurat and this is what I got:
A number of cells have very high (raw) counts for a single antibody.
However, when calculating the percentage of abundance of each of the 3 antibodies in each cell, most cells detect a mixture of all 3 antibodies. Some cells even have an almost 50%-50% split between two antibodies.
edit: Using stricter filters restricting the analysis to 15k cells with >1000 counts and >500 features show exactly the same:
Is this the expected result of using TotalSeq-B, or did something go wrong in the lab? (this is my first time analyzing results using this technology, I don't know what to expect). Why are there so many reads of antibodies that shouldn't be present on all cells/samples?
Are the results of TotalSeq "normal" and I am just using cellranger multi
wrong? Do you see where are my parameters wrong? Why are the antibodies for samples 1&3 detected at such similar rates? Is there a typo in the antibody sequence?
Is cellranger multi
the right tool for this job or should I just use cellranger count
and use Seurat's functions like HTODemux
?
Thanks in advance
edit: I have added context for the .h5 file used, updated the 2 associated antibody counts figures to reflect the number of cells used, and added a new figure using a stricter cell filtering.
Can you put a threshold on total cell count before the last plots? I suspect the "middle" of the simplex is poisoned by background barcodes with few UMI.
Hi, the last plots were generated using the
filtered_feature_bc_matrix.h5
file. It contains 25k cells with >500 counts. I have updated the post and figures to clearly reflect the number of cells involved.I have also generated a new figure using a stricter filter ( >1000 counts & >500 features) with 15k cells and the resulting figure is basically the same: Good/decent quality cells have mixed proportions of all 3 antibodies. Is this TotalSeq-B intended result?