How to demultiplex data with totalseq B antibodies using hashsolo?
0
0
Entering edit mode
20 months ago
bioinfo ▴ 150

Hello,

I was given two sets of fastq files, one of them is gene expression data and the other one contains the information of the samples that were hashtagged using total seq B antibodies. I was asked to use hashsolo to analyze the data and I am a bit confused about how to do that.

If I use the hashsolo on the fastq files with the hashtagging information then how do I connect that to the gene expression data?

I read in a paper that they used cellranger count and used the hashtags as antibody capture and then input the output matrix which contains both the gene expression and hashtag matrixes in python and used hashsolo. However, when I tried to do that I could not get hashsolo to work because the hashtag information was in the .var column instead of the .obs. For this approach, I used cellranger count with feature barcode data and put the hashtags as antibody capture and I ended up with one matrix.

Thank you

cellranger. seq hashsolo rna • 1.7k views
ADD COMMENT
0
Entering edit mode

It's hard to say why it's not working for you without seeing the data, your code, and errors if any. Hashsolo is kind of straightforward: the function accepts an AnnData object of hash counts. You'll get a combined feature-barcode matrix from CellRanger. So, what you need to do is to get the CellRanger output, create an AnnData object with just the hash counts and feed that into hashsolo function.

Please come back with more information.

ADD REPLY
0
Entering edit mode

I did the cellranger count step with the feture barcoding and then I uploaded the data as shown below:

    #Import files
    anndata= sc.read_10x_h5('filtered_feature_bc_matrix.h5', gex_only=False)
    anndata.var_names_make_unique()
    anndata.layers["counts"] = anndata.X.copy()
    sc.pp.filter_genes(anndata, min_counts=1)
    anndata

    AnnData object with n_obs × n_vars = 8714 × 20868
        var: 'gene_ids', 'feature_types', 'genome', 'n_counts'
        layers: 'counts'

    anndata.var["feature_types"].value_counts()

    Gene Expression     20863
    Antibody Capture        5
    Name: feature_types, dtype: int64

    protein = anndata[:, anndata.var["feature_types"] == "Antibody Capture"].copy()
    rna = anndata[:, anndata.var["feature_types"] == "Gene Expression"].copy()

    protein.var_names

Index(['TotalSeq-B0301_anti-mouse_Hashtag_1_Antibody',
       'TotalSeq-B0302_anti-mouse_Hashtag_2_Antibody',
       'TotalSeq-B0303_anti-mouse_Hashtag_3_Antibody',
       'TotalSeq-B0304_anti-mouse_Hashtag_4_Antibody',
       'TotalSeq-B0305_anti-mouse_Hashtag_5_Antibody'],
      dtype='object')

I see the hashtag antibodies on my object but they are on the .var column. I was planning to use the command sce.pp.hashsolo(data, ['Hash1', 'Hash2', 'Hash3']) from scanpy (https://scanpy.readthedocs.io/en/stable/generated/scanpy.external.pp.hashsolo.html). However, when I tried sc.external.pp.hashsolo(protein,['TotalSeq-B0301_anti-mouse_Hashtag_1_Antibody', 'TotalSeq-B0302_anti-mouse_Hashtag_2_Antibody', 'TotalSeq-B0303_anti-mouse_Hashtag_3_Antibody','TotalSeq-B0304_anti-mouse_Hashtag_4_Antibody','TotalSeq-B0305_anti-mouse_Hashtag_5_Antibody']) I got the error below:

 ---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-56-4cd0bc643f1a> in <cell line: 1>()
----> 1 sc.external.pp.hashsolo(protein,['TotalSeq-B0301_anti-mouse_Hashtag_1_Antibody', 'TotalSeq-B0302_anti-mouse_Hashtag_2_Antibody', 'TotalSeq-B0303_anti-mouse_Hashtag_3_Antibody','TotalSeq-B0304_anti-mouse_Hashtag_4_Antibody','TotalSeq-B0305_anti-mouse_Hashtag_5_Antibody'])

3 frames
/usr/local/lib/python3.9/dist-packages/pandas/core/indexes/base.py in _raise_if_missing(self, key, indexer, axis_name)
   6128                 if use_interval_msg:
   6129                     key = list(key)
-> 6130                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   6131 
   6132             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())

KeyError: "None of [Index(['TotalSeq-B0301_anti-mouse_Hashtag_1_Antibody',\n       'TotalSeq-B0302_anti-mouse_Hashtag_2_Antibody',\n       'TotalSeq-B0303_anti-mouse_Hashtag_3_Antibody',\n       'TotalSeq-B0304_anti-mouse_Hashtag_4_Antibody',\n       'TotalSeq-B0305_anti-mouse_Hashtag_5_Antibody'],\n      dtype='object')] are in the [columns]"

Am I importing my matrix wrongly?

Thank you

ADD REPLY
1
Entering edit mode

I ended up using hashsolo.hashsolo instead of sc.external.pp.hashsolo and I got it to work. I think part of the issue that I am having is that after cellranger count the hashtag information is on the .var column instead of the .obs.

ADD REPLY

Login before adding your answer.

Traffic: 1644 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6