I'm trying to integrate my control and treatment data in Seurat. My experiment is a PDX/barnyard experiment, so my data is from both human and mice (human tumor implanted into mice). This integration step keeps filtering out all my human genes and I've been trying to force Seurat to use the human genes in the integration, along with whatever other genes it thinks is important, but I've gotten a variety of errors, including this one, which I can't seem to find any support online for to help me deal with:
> seurat_integrated <- IntegrateData(anchorset=anchors, features.to.integrate=integ_features)
Merging dataset 2 into 1
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Warning: Not all features provided are in this Assay object, removing the following feature(s): GRCh38-CAMTA1, GRCh38-EFHD2, GRCh38-MRTO4, GRCh38-CAPZB, GRCh38-CDC42, ...
Could someone possibly tell me what steps I might take to successfully integrate my PDX data, and perhaps also any relevant "best practices" for integrating PDX data (either in RNA-seq analysis generally, or in Seurat alone)? I'd be very grateful for whatever help you could provide and please let me know if anything is unclear!
For reference, here is the portion of my script that deals with integration:
# Basic pipeline and integration
CTRL[["groups"]] <- "CTRL"
TREAT[["groups"]] <- "TREAT"
combined <- merge(CTRL, TREAT)
split_seurat <- SplitObject(combined, split.by="groups")
split_seurat <- split_seurat[c("CTRL", "TREAT")]
split_seurat[["human.percent.mt"]] <- NULL
for (i in 1:length(split_seurat)) {
split_seurat[[i]] <- NormalizeData(split_seurat[[i]])
split_seurat[[i]] <- FindVariableFeatures(split_seurat[[i]], selection.method="vst", nfeatures=3000)
split_seurat[[i]] <- ScaleData(split_seurat[[i]])
split_seurat[[i]] <- RunPCA(split_seurat[[i]])
split_seurat[[i]] <- subset(split_seurat[[i]], subset=nCount_RNA > min_reads)
split_seurat[[i]] <- SCTransform(split_seurat[[i]], vars.to.regress = c("mouse.percent.mt", "human.percent.mt"))
}
integ_features <- SelectIntegrationFeatures(object.list=split_seurat, nfeatures=3000)
integ_features <- c(integ_features, rownames(human_CTRL), rownames(human_TREAT), more_mouse_genes)
integ_features <- unique(integ_features)
integ_features <- unlist(integ_features)
anchors <- FindIntegrationAnchors(object.list=split_seurat, anchor.features=integ_features)
seurat_integrated <- IntegrateData(anchorset=anchors, features.to.integrate=integ_features)
DefaultAssay(seurat_integrated) <- "integrated"
saveRDS(seurat_integrated, "/home/asd3535/seurat_integrated.rds")
Thanks @jared.andrews07 these thoughts are very productive. I hope this isn’t too simple a question but if the focus of my project is on the change in immune cells, would it make sense to remove the human cancer cells and only analyze the mouse cells? (The human cancer cells from the treatment group don’t seem to be of great quality as well.) This is how I was doing it initially but after reading more, came to feel that this must be wrong.
If response in the tumor micro environment is the question, then I don't see why not, unless there are changes/correlates in the tumor that are of interest. Though again, just analyzing them separately if needed seems more straightforward, especially for a first pass.
Thank you @jared.andrews07 this is very helpful!