Bioinformatic analysis of CITE-seq data
CITE-seq is a nice method of multiplexing single cell libraries using antibodies. Details here: https://cite-seq.com/
Although software exists, we found the exact methods very unclear so would like to present them here
CITE-seq count approach
Many of these details have been adjusted from these discussions: https://github.com/Hoohm/CITE-seq-Count/issues/5
Terminology:
- hto: hashtag
- mRNA - read 2 from cellranger containing the actual transcriptome reads
Software needed
- Cellranger - 10X genomics
- Cite-seq-count - https://github.com/Hoohm/CITE-seq-Count
- Seurat (R) - https://satijalab.org/seurat/ and in particular https://satijalab.org/seurat/v3.1/hashing_vignette.html
General steps needed
- 1. We use **cellranger** as usual to create a mRNA matrix cell barcode and UMI vs mRNA. No hto information is included (as these go into the "undetermined" fastqs) !
- 1a. Run Cellranger mkfastq as usual
- 1b. Run Cellranger count as usual
- 2. We use CITE-seq-count to just create a hto matrix of cell barcode and UMI vs hashtags+polyA. No mRNA transcriptome reads are included, so the hto to transcriptome must be remapped in Seurat in step 3 below.
- 2a. CITE-seq-count using **undetermined** reads from Cellranger mkfastq (step 1a)
- 3. The resulting matrices, i.e. mRNA from 1b and hto from 2a, are **combined** in Seurat using the hashtag demux tutorial
More detailed steps
# Step1: cellranger mkfastq using the standard 10X barcodes
# Result1: 3 fastqs (R1, R2, index) from the transcriptome
# Result2: and 3 fastqs (R1, R2, index) from the hashtags
# Step2: cellranger count
# input: 3 fastqs (R1, R2, index) from the transcriptome
# Result: digital gene-cell expression matrix. - for whitelist and counts
# Step3: CITE-seq-Count
# Input: whitelist: use the cell barcodes from the transcriptome (step 2) as a whitelist
# Input :running R1, R2 from the hashtags through CITE-seq-Count to
# Result: get a hashtag-cell matrix
## Seurat hashing vignette
# Step4: Combine results and check results in Seurat
# https://satijalab.org/seurat/v3.1/hashing_vignette.html
Warnings and errors
[WARNING] Read1 length is 28bp but you are using 26bp for Cell and UMI barcodes combined. This might lead to wrong cell attribution and skewed umi counts.
10X Cellranger V2 vs V3: The UMI is 10-bp long in V2 but 12-bp in V3.
As an alternative to Seurat's vignette, I would recommend the corresponding chapter of the OSCA book that explains how to analyze the protein abundance measurements from CITE-seq data (but doesn't yet go into the details of hash tags)