Hello,
I have output from a single cell sequencing run that has both the VDJ and gene expression data. For the same cells, we also used a hybrid capture approach to sequence the TCR sequences. I have compared the TCR sequences across the two approaches and I have found a list of TRA and TRB cdr3 sequences that overlap. For the TRA sequences, there are 101 cells that were identified in both approaches. From here, I would like to use dimplot to look at a UMAP of where these cells fall in the clustering.
I have used the code below to integrate the vdj and gene expression data:
tcr <- read.csv(paste("/Users/carlygraham/Dropbox/BramsonLab/scRNAseq-Feb16/Multi_TAC_Output_v2/vdj/", "filtered_contig_annotations.csv", sep=""))
tcr <- tcr[!duplicated(tcr$barcode), ]
# Only keep the barcode and clonotype columns.
# We'll get additional clonotype info from the clonotype table.
tcr <- tcr[,c("barcode", "raw_clonotype_id")]
names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"
# Clonotype-centric info.
clono <- read.csv(paste("/Users/carlygraham/Dropbox/BramsonLab/scRNAseq-Feb16/Multi_TAC_Output_v2/vdj/","clonotypes.csv", sep=""))
# Slap the AA sequences onto our original table by clonotype_id.
tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])
# Reorder so barcodes are first column and set them as rownames.
tcr <- tcr[, c(2,1,3)]
rownames(tcr) <- tcr[,1]
tcr[,1] <- NULL
# Add to the Seurat object's metadata.
scRNAseq.seurat <- AddMetaData(object=scRNAseq.seurat, metadata=tcr)
This effectively gives me a seurat object with the cdr3 as metadata
head(scRNAseq.seurat$cdr3s_aa)
AAACCTGAGCGTGTCC-1 "TRB:CASGRTGTYEQYF;TRA:CAAREGDKIIF;TRA:CASDAGNMLTF" AAACCTGAGGCTCAGA-1 "TRB:CASSVPPGNTEAFF;TRA:CALSEGGLMYSGGGADGLTF;TRA:CAVGHSSGSARQLTF" AAACCTGAGTGCGATG-1 "TRB:CSGKEGGMGTEAFF;TRA:CALSDRGSGNTPLVF" AAACCTGCAAAGTGCG-1 "TRB:CASSEWGRGDTQYF;TRB:CASSHASIGNNEQFF;TRA:CAVRDQGRLMF;TRA:CAVTVNTNAGKSTF" AAACCTGCAAGGGTCA-1 "TRB:CASSRGWRQETQYF;TRA:CAAPINFGNEKLTF" AAACCTGCACGCGAAA-1 "TRB:CASSPTGRDNTEAFF;TRA:CAYGPPPAGNMLTF;TRA:CGAVNSGGYQKVTF"
From here is there a way to pull out some of the cells and display them on the UMAP?
Ex. pull out the cells with TRA:CAYGPPPAGNMLTF and TRA:CAAREGDKIIF?
Thanks!
I expect OP will need partial matches, so a few
grepl
statements may be needed.Thank you for the quick answers!
I think using the cells flag is the way to go. Although, the code above doesn't quite work. With the top code I get an empty plot and with the bottom I get the following error:
Error: No cells found
I think using grepl() is required but I am not quite sure how to incorporate it. As far as I can tell, right now I am not pulling out the partial match of "TRA:CAYGPPPAGNMLTF" but instead looking for cells with this as the only entry in the cdr3s_aa column. Any help with this would be greatly appreciated!
From what it looks like there might not be any matches to that exact sequence in your data. You might want to check the prevalence of certain sequences to see what you're dealing with.
or