Question

UMAP of TRA/B

0

Entering edit mode

3.3 years ago

cgraham13 • 0

Hello,

I have output from a single cell sequencing run that has both the VDJ and gene expression data. For the same cells, we also used a hybrid capture approach to sequence the TCR sequences. I have compared the TCR sequences across the two approaches and I have found a list of TRA and TRB cdr3 sequences that overlap. For the TRA sequences, there are 101 cells that were identified in both approaches. From here, I would like to use dimplot to look at a UMAP of where these cells fall in the clustering.

I have used the code below to integrate the vdj and gene expression data:

tcr <- read.csv(paste("/Users/carlygraham/Dropbox/BramsonLab/scRNAseq-Feb16/Multi_TAC_Output_v2/vdj/", "filtered_contig_annotations.csv", sep=""))

tcr <- tcr[!duplicated(tcr$barcode), ]

# Only keep the barcode and clonotype columns. 
# We'll get additional clonotype info from the clonotype table. 

tcr <- tcr[,c("barcode", "raw_clonotype_id")] 
names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"

# Clonotype-centric info. 
clono <- read.csv(paste("/Users/carlygraham/Dropbox/BramsonLab/scRNAseq-Feb16/Multi_TAC_Output_v2/vdj/","clonotypes.csv", sep=""))

# Slap the AA sequences onto our original table by clonotype_id. 
tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])

# Reorder so barcodes are first column and set them as rownames.
tcr <- tcr[, c(2,1,3)] 
rownames(tcr) <- tcr[,1] 
tcr[,1] <- NULL

# Add to the Seurat object's metadata. 
scRNAseq.seurat <- AddMetaData(object=scRNAseq.seurat, metadata=tcr)

This effectively gives me a seurat object with the cdr3 as metadata

head(scRNAseq.seurat$cdr3s_aa)

AAACCTGAGCGTGTCC-1 "TRB:CASGRTGTYEQYF;TRA:CAAREGDKIIF;TRA:CASDAGNMLTF" AAACCTGAGGCTCAGA-1 "TRB:CASSVPPGNTEAFF;TRA:CALSEGGLMYSGGGADGLTF;TRA:CAVGHSSGSARQLTF" AAACCTGAGTGCGATG-1 "TRB:CSGKEGGMGTEAFF;TRA:CALSDRGSGNTPLVF" AAACCTGCAAAGTGCG-1 "TRB:CASSEWGRGDTQYF;TRB:CASSHASIGNNEQFF;TRA:CAVRDQGRLMF;TRA:CAVTVNTNAGKSTF" AAACCTGCAAGGGTCA-1 "TRB:CASSRGWRQETQYF;TRA:CAAPINFGNEKLTF" AAACCTGCACGCGAAA-1 "TRB:CASSPTGRDNTEAFF;TRA:CAYGPPPAGNMLTF;TRA:CGAVNSGGYQKVTF"

From here is there a way to pull out some of the cells and display them on the UMAP?

Ex. pull out the cells with TRA:CAYGPPPAGNMLTF and TRA:CAAREGDKIIF?

Thanks!

seurat r • 1.8k views

ADD COMMENT • link updated 3.3 years ago by rpolicastro 13k • written 3.3 years ago by cgraham13 • 0

score 0 · Answer 1 · 2021-08-19

0

Entering edit mode

3.3 years ago

rpolicastro 13k

seqs <- c("TRA:CAYGPPPAGNMLTF", "TRA:CAAREGDKIIF")

Some of the plotting functions have the cells argument that you can provide the cell names to plot.

cells <- names(scRNAseq.seurat$cdr3s_aa[scRNAseq.seurat$cdr3s_aa %in% seqs])

DimPlot(scRNAseq.seurat, cells=cells, group.by="cdr3s_aa")

You can also subset the seurat object and plot that.

seu_subset <- subset(scRNAseq.seurat, subset = cdr3s_aa %in% seqs)

DimPlot(seu_subset, group.by="cdr3s_aa")

ADD COMMENT • link 3.3 years ago by rpolicastro 13k

0

Entering edit mode

I expect OP will need partial matches, so a few grepl statements may be needed.

ADD REPLY • link 3.3 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Thank you for the quick answers!

I think using the cells flag is the way to go. Although, the code above doesn't quite work. With the top code I get an empty plot and with the bottom I get the following error:

seu_subset <- subset(scRNAseq.seurat, subset = cdr3s_aa %in% seqs)

Error: No cells found

I think using grepl() is required but I am not quite sure how to incorporate it. As far as I can tell, right now I am not pulling out the partial match of "TRA:CAYGPPPAGNMLTF" but instead looking for cells with this as the only entry in the cdr3s_aa column. Any help with this would be greatly appreciated!

ADD REPLY • link 3.3 years ago by cgraham13 • 0

0

Entering edit mode

From what it looks like there might not be any matches to that exact sequence in your data. You might want to check the prevalence of certain sequences to see what you're dealing with.

sort(table(scRNAseq.seurat$cdr3s_aa))

or

library("dplyr")

count(scRNAseq.seurat[[]], cdr3s_aa, sort=TRUE)

ADD REPLY • link 3.3 years ago by rpolicastro 13k