Bioinformatic analysis of CITE-seq data

Question

Tutorial:scRNA-seq CITE-seq-count bioinformatics

14

Entering edit mode

5.1 years ago

colindaven 7.4k

Bioinformatic analysis of CITE-seq data

CITE-seq is a nice method of multiplexing single cell libraries using antibodies. Details here: https://cite-seq.com/

Although software exists, we found the exact methods very unclear so would like to present them here

CITE-seq count approach

Many of these details have been adjusted from these discussions: https://github.com/Hoohm/CITE-seq-Count/issues/5

Terminology:

hto: hashtag
mRNA - read 2 from cellranger containing the actual transcriptome reads

Software needed

Cellranger - 10X genomics
Cite-seq-count - https://github.com/Hoohm/CITE-seq-Count
Seurat (R) - https://satijalab.org/seurat/ and in particular https://satijalab.org/seurat/v3.1/hashing_vignette.html

General steps needed

 - 1. We use **cellranger** as usual to create a mRNA matrix cell barcode and UMI vs mRNA. No hto information is included (as these go into the "undetermined" fastqs)  ! 

 - 1a. Run Cellranger mkfastq as usual

 - 1b. Run Cellranger count as usual

 - 2. We use CITE-seq-count to just create a hto matrix of cell barcode and UMI vs hashtags+polyA. No mRNA transcriptome reads are included, so the hto to transcriptome must be remapped in Seurat in step 3 below.

 - 2a. CITE-seq-count using **undetermined** reads from Cellranger mkfastq (step 1a)

 - 3. The resulting matrices, i.e. mRNA from 1b and hto from 2a, are **combined** in Seurat using the hashtag demux tutorial

More detailed steps

# Step1: cellranger mkfastq using the standard 10X barcodes
# Result1: 3 fastqs (R1, R2, index) from the transcriptome 
# Result2: and 3 fastqs (R1, R2, index) from the hashtags

# Step2: cellranger count
# input: 3 fastqs (R1, R2, index) from the transcriptome
# Result: digital gene-cell expression matrix. - for whitelist and counts

# Step3: CITE-seq-Count
# Input: whitelist: use the cell barcodes from the transcriptome (step 2) as a whitelist 
# Input :running R1, R2 from the hashtags through CITE-seq-Count to 
# Result: get a hashtag-cell matrix

## Seurat hashing vignette
# Step4: Combine results and check results in Seurat
# https://satijalab.org/seurat/v3.1/hashing_vignette.html

Warnings and errors

[WARNING] Read1 length is 28bp but you are using 26bp for Cell and UMI barcodes combined. This might lead to wrong cell attribution and skewed umi counts.

10X Cellranger V2 vs V3: The UMI is 10-bp long in V2 but 12-bp in V3.

CITE-seq-count scRNA-seq RNA-Seq • 7.7k views

ADD COMMENT • link updated 21 months ago by ATpoint 87k • written 5.1 years ago by colindaven 7.4k

1

Entering edit mode

As an alternative to Seurat's vignette, I would recommend the corresponding chapter of the OSCA book that explains how to analyze the protein abundance measurements from CITE-seq data (but doesn't yet go into the details of hash tags)

ADD REPLY • link 4.2 years ago by Friederike 9.0k

score 0 · Answer 1 · 2023-07-13

0

Entering edit mode

21 months ago

Assa Yeroslaviz ★ 1.9k

Would it be possible for you to add the commands you used in this explanation?

I'm trying to analyze the data from the 10x Data sets here, but I'm not sure, how to separately analyze the two data sets, as cite-seq-count needs R1 from the transcriptome and R2 from the protein information (ADT). But, I have two different folders of results and I'm not sure how to approach this.

thanks in advance

Assa

ADD COMMENT • link 21 months ago by Assa Yeroslaviz ★ 1.9k

1

Entering edit mode

CellRanger count takes full care of HTOs/ADTs, just precisely follow the manual. R1 is CBs and UMIs and R2 is cDNA/antibody sequence. No magic here. Follow the manual.

ADD REPLY • link 21 months ago by ATpoint 87k

0

Entering edit mode

which manual?

these are the files I have:

$ ls -1  -R 5k_pbmc_protein_v3_nextgem_fastqs/
5k_pbmc_protein_v3_nextgem_fastqs/:
5k_pbmc_protein_v3_nextgem_antibody_fastqs
5k_pbmc_protein_v3_nextgem_gex_fastqs

5k_pbmc_protein_v3_nextgem_fastqs/5k_pbmc_protein_v3_nextgem_antibody_fastqs:
5k_pbmc_protein_v3_nextgem_antibody_S2_L001_I1_001.fastq.gz
5k_pbmc_protein_v3_nextgem_antibody_S2_L001_R1_001.fastq.gz
5k_pbmc_protein_v3_nextgem_antibody_S2_L001_R2_001.fastq.gz
5k_pbmc_protein_v3_nextgem_antibody_S2_L002_I1_001.fastq.gz
5k_pbmc_protein_v3_nextgem_antibody_S2_L002_R1_001.fastq.gz
5k_pbmc_protein_v3_nextgem_antibody_S2_L002_R2_001.fastq.gz

5k_pbmc_protein_v3_nextgem_fastqs/5k_pbmc_protein_v3_nextgem_gex_fastqs:
5k_pbmc_protein_v3_nextgem_gex_S1_L001_I1_001.fastq.gz
5k_pbmc_protein_v3_nextgem_gex_S1_L001_R1_001.fastq.gz
5k_pbmc_protein_v3_nextgem_gex_S1_L001_R2_001.fastq.gz
5k_pbmc_protein_v3_nextgem_gex_S1_L002_I1_001.fastq.gz
5k_pbmc_protein_v3_nextgem_gex_S1_L002_R1_001.fastq.gz
5k_pbmc_protein_v3_nextgem_gex_S1_L002_R2_001.fastq.gz

Do I run cellranger count only on the gex folder (rna), or on both?

As you can see I have for both the AB and the transcriptome R1 and R2. Which files do I use in the cite-seq-count as R1 and R2?

ADD REPLY • link 21 months ago by Assa Yeroslaviz ★ 1.9k

1

Entering edit mode

https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis

ADD REPLY • link 21 months ago by ATpoint 87k

1

Entering edit mode

I finally figured out how to continue. It took me a while to figure out how to work cellranger with this data, but I would like to understand if this is correct.

with cellranger I can count both of them in parallel. I don't need CITE-Seq count at all. Are there any advantages to using CITE-seq for counting the antibodies?

To make cellranger works, I put the two folders in a csv file as such:

fastqs,sample,library_type
5k_pbmc_protein_v3_nextgem_gex_fastqs/,5k_pbmc_protein_v3_nextgem_gex,Gene Expression
5k_pbmc_protein_v3_nextgem_antibody_fastqs/,5k_pbmc_protein_v3_nextgem_antibody,Antibody Capture

Each of the folders contains the fastq files from the two lanes of the flowcell. Together with the feature barcode list provided by 10x I could run cellranger as followed

/fs/home/yeroslaviz/software/cellranger-7.1.0/cellranger count --id=5k_pbmv \
--libraries=CiteSeq/pbmc_libraries.csv \
--transcriptome=refdata-gex-GRCh38-2020-A \
--feature-ref=CiteSeq/pbmc_protein_v3_nextgem_feature.csv

Now I have the standard output from cellranger with the matrix and the feature/barcodes folders for filtered and raw data.

Do I understand it correctly, the both the gene expression counts and the antibody counts are in the same matrix?

How do I efficiently separate them?

thanks again for the link.

ADD REPLY • link 21 months ago by Assa Yeroslaviz ★ 1.9k

1

Entering edit mode

Yes, they're in the same matrix. Antibody counts are always on the bottom, iirc they have the same name as in the feature-ref csv file. Just do a tail on the count matrix once you loaded them into R, then you'll see the names.

ADD REPLY • link 21 months ago by ATpoint 87k