Question

Tutorial:Associating VDJ clonotyping data with scRNA-seq in Seurat

8

Entering edit mode

5.5 years ago

jared.andrews07 ★ 18k

Single cell technologies are booming, but figuring out how to integrate multimodal data sets is certainly still a challenge, even with well-supported and constantly updated software (like Seurat). I struggled to find any info as to how to associate my Chromium 10X TCR-sequencing data with the corresponding scRNA-seq data. Seurat has a fairly good vignette on associating multimodal data, but that approach doesn't really work for V(D)J cell profiling.

The author's commented that the way to do it was by adding to the metadata of a Seurat object, but the method to do that remained unclear (partially due to the AddMetaData function not having great documentation). In reality, it's pretty easy - you can add any dataframe as Metadata to a Seurat object so long as its row names match the objects as shown with object@meta.data.

The VDJ info from 10X takes some amount of fanagling to get in a proper data frame with 1 row per cell with usable clonotype info. This R function makes it straightforward for a typical CellRanger3 workflow used for both scRNA and TCR-seq and Seurat v3:

add_clonotype <- function(tcr_location, seurat_obj){
    tcr <- read.csv(paste(tcr_folder,"filtered_contig_annotations.csv", sep=""))

    # Remove the -1 at the end of each barcode.
    # Subsets so only the first line of each barcode is kept,
    # as each entry for given barcode will have same clonotype.
    tcr$barcode <- gsub("-1", "", tcr$barcode)
    tcr <- tcr[!duplicated(tcr$barcode), ]

    # Only keep the barcode and clonotype columns. 
    # We'll get additional clonotype info from the clonotype table.
    tcr <- tcr[,c("barcode", "raw_clonotype_id")]
    names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"

    # Clonotype-centric info.
    clono <- read.csv(paste(tcr_folder,"clonotypes.csv", sep=""))

    # Slap the AA sequences onto our original table by clonotype_id.
    tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])

    # Reorder so barcodes are first column and set them as rownames.
    tcr <- tcr[, c(2,1,3)]
    rownames(tcr) <- tcr[,1]
    tcr[,1] <- NULL

    # Add to the Seurat object's metadata.
    clono_seurat <- AddMetaData(object=seurat_obj, metadata=tcr)
    return(clono_seurat)
    }

This saves a clonotype ID and CDR3 AA sequence, as the clonotype ID may not be usable as an identifier if samples are combined.

Hopefully this saves someone a bit of time.

single-cell RNA-Seq seurat TCR • 16k views

ADD COMMENT • link updated 10 weeks ago by Kimaya • 0 • written 5.5 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Since you had some confusion about AddMetaData, I just wanted to add that object@meta.data is just a data frame. If you are more comfortable with standard R data frames, you can do something like

object@meta.data$new_col <- some_values

or

object@meta.data[cell_barcodes, "new_col"] <- my_df[cell_barcodes, "some_col"]

to make sure that you are using the same cell barcodes for the labels.

Before v3, the AddMetaData function was actually relatively simple: https://github.com/satijalab/seurat/blob/65b77a9480281ef9ab1aa8816f7c781752092c18/R/interaction.R#L1120-L1137

ADD REPLY • link 5.5 years ago by igor 13k

0

Entering edit mode

Thanks for the addition. Yeah, I actually went back to v2 to figure it out. I'm not quite sure why they changed the documentation to something that's (imo) more confusing, but you're dead-on that it's really simple once you realize it's just a data frame.

ADD REPLY • link 5.5 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Thanks for this really helpful tutorial. I have now added my VDJ data to two seurat objects ("before" and "after" treatment) and then intergrated them. I now have two questions that I was wondering if you could help me with please? I would like to subset my Seurat object so that I retain the top 10 most abundant clones from "before" and "after" so that I can compare their gene expression. However, whilst I can see in the $clonotype_id that each cell has the information e.g.

head(immune.combined$clonotype_id) AAACGGGAGTGGGTTG AAAGTAGAGAAACGAG AAAGTAGAGACTGGGT AAAGTAGCAGTAAGCG "clonotype25" "clonotype6" "clonotype1" NA AAATGCCAGAGATGAG AACACGTCAGATTGCT "clonotype1" "clonotype26"

(where clonotype1 was the most abundant clonotype) I can't work out how to subset on "clonotype1", "clonotype2" etc. Please could you tell me how to do this? Also, each of the original seurat objects had its own clonotype1, clonotype2 etc. When I integrated them will they have been renamed or will clonotype1 now refer to two different cdr3s, one from "before" and one from "after"? If so, how would I go about subsetting the clonotype that I am interested in from just one condition. Many thanks in advance for your help.

ADD REPLY • link 5.0 years ago by Hanna ▴ 10

1

Entering edit mode

The clonotype IDs will repeat on a sample-by-sample basis, that's why I grab the sequences here, as they are directly comparable. To elaborate, clonotype 1 for one sample will not be the same as clonotype 1 from another sample.

Do you want to compare the 10 most abundant clones from "before" and the 10 most abundant from "after", or just the 10 most abundant clones overall (but compared between the two groups)?

You can see how to subset Seurat objects here (click the Seurat Object Interaction tab).

ADD REPLY • link 5.0 years ago by jared.andrews07 ★ 18k

1

Entering edit mode

Thanks, for some reason I was trying to make subsetting the seurat object far more complicated than it needed to be and your link solved it for me. I wanted to subset on the top 10 from each group, so this was easy to do as the names have remained the same, but now I can see that if I do want to look at just the top 10 overall I could subset on the cdr3s_aa by the specific aa sequence. Thanks again for your help!

ADD REPLY • link 5.0 years ago by Hanna ▴ 10

0

Entering edit mode

Thank you for this helpful tutorial. I want to ask a question with this function. I am a new user, so I don't understand why the following code doesn't work for merging TCR and gene expression data.

add_clonotype <- function(fullpath, pbmc_5){
  fullpath = getwd()
  tcr <- read.csv(paste(fullpath, "G3_filtered_contig_annotations.csv", sep=""))
  tcr$barcode <- gsub("-1", "", tcr$barcode)
  tcr <- tcr[!duplicated(tcr$barcode), ]
  tcr <- tcr[,c("barcode", "raw_clonotype_id")]
  names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"
  clono <- read.csv(paste(fullpath, "G3_clonotypes.csv", sep=""))
  tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])
  tcr <- tcr[, c(2,1,3)]
  rownames(tcr) <- tcr[,1]
  tcr[,1] <- NULL
  clono_seurat <- AddMetaData(object=pbmc_5, metadata=tcr)
  return(clono_seurat)
}

clonotypeadd <- add_clonotype()

pbmc_5 is a seurat object. This code just causes Error in file(file, "rt") : cannot open the connection. Please help me to figure it out. Many thanks.

ADD REPLY • link 4.8 years ago by moon.js • 0

1

Entering edit mode

Did you try running it one line at a time? That way, you can figure out exactly which command is problematic.

ADD REPLY • link 4.8 years ago by igor 13k

0

Entering edit mode

This is a good idea. In particular, check that your files paths are correct for the read.csv calls, as that's likely where it is going wrong.

ADD REPLY • link 4.8 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Sorry for late reply. I run it one line at a time and it works. Thanks again for your help.

ADD REPLY • link 4.7 years ago by moon.js • 0

0

Entering edit mode

Hello, Very sory for this basic question, But I have a problem using this function: Error in paste(tcr_folder, "filtered_contig_annotations.csv", sep = "") : object 'tcr_folder' not found

However, I've already created a folder named tcr_folder containing the required files.

Many thanks

ADD REPLY • link 4.3 years ago by ak_azouz • 0

1

Entering edit mode

Are you sure your paths are correct from your current working directory? How are you calling the function?

ADD REPLY • link 4.3 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Thank you very much for your reply, and very sorry for this delay to answer. Everything works now.

Thanks again

ADD REPLY • link 4.3 years ago by ak_azouz • 0

0

Entering edit mode

Hello, I am having a bit of an issue. My "clonotype_id" and "cdr33s_aa" keep coming up NA for all values, bust when I do clono_seurat$(either of those values) it lists the correct values but with NA printed above each value. Also my barcode did not get rid of the -1 tail. Has anyone else experienced this/what should I do? Thanks!

ADD REPLY • link 4.2 years ago by roberts ▴ 60

0

Entering edit mode

I aim to do the same i.e. combine scRNA and VDJ data. I analyzed my scRNA data first, following the standard Seurat workflow, annotated the clusters, and then combined VDJ data to this Seurat data frame which contained cell type information.

I wanted to analyze the data the other way around wherein I first combine the two data frames, get only the cells that intersect in both GEX and VDJ datasets by barcodes (as per your tutorial. The combined dataset had few thousand fewer cells than in GEX seurat objetc). It all worked fine and I have a combined data frame. But now since I want to perform downstream seurat normalization and so on, I will need this combined data frame in my seurat object's metadata slot. This isn't working. Below is my code for which I get error:

clono_seurat <- AddMetaData(object = merged5Seurat, metadata = combined_GEX_TCR)

Error in [[<-: ! Cannot add more or less meta data without cell names Run rlang::last_trace() to see where the error occurred.

Is it a bad idea to first combine and then analyze? I thought I would get cleaner analysis that way and it shouldn't matter as long as I can add it to seurat's metadata. Why do I get this error? Is it because of different cell numbers in the two objects ?

Any help would be appreciated

ADD REPLY • link updated 10 weeks ago by GenoMax 147k • written 10 weeks ago by Kimaya • 0

0

Entering edit mode

Presumably one of the objects is missing cell names, as the error says. This makes it impossible to match up the data to add. My expectation is that the rownames for combined_GEX_TCR do not contain the cell names.

But I also have not kept up with Seurat's versions given how mangled their data structures had become last time I tried it. I'd consider this tutorial likely out of date at this point, though I expect the process it follows is still valid.

There are likely better tools to do this at this point - take a look at scRepetoire and friends.

ADD REPLY • link 10 weeks ago by jared.andrews07 ★ 18k

0

Entering edit mode

Hey Jared! Thanks for your reply. I understand the point of row names not being the same in the two data frames to be combined. But in my case, I created a barcode column in both the data frames and used this column as an anchor to merge them both.

I figured the problem out. The number of rows (/cells) in my merged5seurat didn't match the numbers of rows from my combined_GEX_TCR. I first subsetted my Seurat object to only contain rows that intersect with combined_GEX_TCR and then used AddMetaData with this subsetted Seurat object.

Thanks!

ADD REPLY • link 10 weeks ago by Kimaya • 0

score 0 · Answer 1 · 2019-06-14

0

Entering edit mode

5.5 years ago

atakanekiz ▴ 310

@j-andrews7 thanks for this vignette. I expanded it further to work with my own data.

The detailed tutorial can be found at my Biostars post.

ADD COMMENT • link 5.5 years ago by atakanekiz ▴ 310

0

Entering edit mode

Hi Jared & Atakanekiz,

Thank you for the above information on how to incorporate VDJ information to gene expression data for scRNA seq. I have a question regarding visualizing clonotypes. I have used the Seurat integrated analysis approach to analyze my gene expression data. The metadata contains VDJ information. How do I proceed with visualizing the clonotypes in plots - UMAP? Right now, I am using the following and getting an error:

Idents(seurat_subset_labeled) <- "cdr3s_aa" DimPlot(seurat_subset_labeled, reduction = "umap", label = TRUE, label.size = 3, repel = TRUE)

Error in grid.Call(C_convert, x, as.integer(whatfrom), as.integer(whatto), : Viewport has zero dimension(s)

I read online that this error is seen when the legend is too large - it makes sense in this case since I'm using cdr3_aa as idents for clustering. Any suggestions on how to proceed? Thank you

ADD REPLY • link 4.6 years ago by divya.samuel • 0

0

Entering edit mode

Yeah, viewing cdr3 in a dimplot is a no-go unless you have high clonality or only a few sequences you're interested in. Otherwise, depends what you're interested in, but UMAP/tSNE plots are probably not the way to look at them more closely.

ADD REPLY • link 4.6 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

I visualise the clonotypes like this (sorry I don't know how to format the R script in a post): for a specific paired alpha and beta TCR or a specific TCRbeta alone with no alpha (replace the asterisks with the exact sequence)

E1 <- subset(x = Tcells, subset = cdr3s_aa == "TRA:****;TRB:***"|cdr3s_aa =="TRB:*****" )

p1<- DimPlot(E1, reduction = "umap", cols= c("navyblue","turquoise4" ), label = FALSE)+xlim(c(-12, 0)) + ylim(c(-10, 10)) +NoLegend()+ ggtitle("Your title")

or for looking at any expanded clones

postdoubleOrMore <- subset(x = Tcells, subset = frequency>1)

DimPlot(postdoubleOrMore, reduction = "umap", split.by = "stim",cols= c( "orange", "red" ,"cadetblue3", "navyblue", "dodgerblue", "turquoise4" ), ncol = 2)

Note that I have specified colours based on the number of idents that I know that I have in each of these subsets, just to keep colours consistent throughout

ADD REPLY • link 4.6 years ago by Hanna ▴ 10

0

Entering edit mode

Like Jared says, plotting all the clonotypes will be impossible to look at. But, if you want to plot a specific clonotype (say, transgenic OT-1 TCR) against all the other clonotypes (all non transgenic TCRs), you can use the snippet below:

obj$tcr <- ifelse(grepl(pattern ="TRAV14N-1__TRBV12-1",obj$v_gene), "OT1", 
                       ifelse(!is.na(obj$v_gene), "non-OT1", "n/a"))

Idents(obj) <- "tcr"

obj$tcr <- factor(obj$tcr, levels=c("OT1", "non-OT1", "n/a"))

DimPlot(object = obj, reduction = "umap", label = F, pt.size = 0.5, cols = c("blue", "gold2", "gray90"))

ADD REPLY • link 4.6 years ago by atakanekiz ▴ 310

0

Entering edit mode

Hey everyone, I have a followup question on visualizing the different clonotypes on a UMAP. Would be there a way to visualize lets say clonotype1 on my umap clusters? so in a way I want to be able to see weather a clonotype is more represented in a cluster of my UMAP. Any ideas on that? I tried several things but until now I failed.

Thanks in advance

ADD REPLY • link 3.5 years ago by chi.delta ▴ 40

1

Entering edit mode

Please open your own question with what you've tried, what you want, and the errors/issues you are facing. Piggybacking on questions like this is generally discouraged and makes it hard for people to find (and answer) your question.

ADD REPLY • link 3.5 years ago by jared.andrews07 ★ 18k