Question

Trouble getting IntegrateData to include specified features in Seurat

0

Entering edit mode

3.5 years ago

Aaron ▴ 30

I'm trying to integrate my control and treatment data in Seurat. My experiment is a PDX/barnyard experiment, so my data is from both human and mice (human tumor implanted into mice). This integration step keeps filtering out all my human genes and I've been trying to force Seurat to use the human genes in the integration, along with whatever other genes it thinks is important, but I've gotten a variety of errors, including this one, which I can't seem to find any support online for to help me deal with:

> seurat_integrated <- IntegrateData(anchorset=anchors, features.to.integrate=integ_features)
Merging dataset 2 into 1
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Warning: Not all features provided are in this Assay object, removing the following feature(s): GRCh38-CAMTA1, GRCh38-EFHD2, GRCh38-MRTO4, GRCh38-CAPZB, GRCh38-CDC42, ...

Could someone possibly tell me what steps I might take to successfully integrate my PDX data, and perhaps also any relevant "best practices" for integrating PDX data (either in RNA-seq analysis generally, or in Seurat alone)? I'd be very grateful for whatever help you could provide and please let me know if anything is unclear!

For reference, here is the portion of my script that deals with integration:

# Basic pipeline and integration
CTRL[["groups"]] <- "CTRL"
TREAT[["groups"]] <- "TREAT"
combined <- merge(CTRL, TREAT)

split_seurat <- SplitObject(combined, split.by="groups")
split_seurat <- split_seurat[c("CTRL", "TREAT")]
split_seurat[["human.percent.mt"]] <- NULL

for (i in 1:length(split_seurat)) {
  split_seurat[[i]] <- NormalizeData(split_seurat[[i]])
  split_seurat[[i]] <- FindVariableFeatures(split_seurat[[i]], selection.method="vst", nfeatures=3000)
  split_seurat[[i]] <- ScaleData(split_seurat[[i]])
  split_seurat[[i]] <- RunPCA(split_seurat[[i]])
  split_seurat[[i]] <- subset(split_seurat[[i]], subset=nCount_RNA > min_reads)

  split_seurat[[i]] <- SCTransform(split_seurat[[i]], vars.to.regress = c("mouse.percent.mt", "human.percent.mt"))
}

integ_features <- SelectIntegrationFeatures(object.list=split_seurat, nfeatures=3000)

integ_features <- c(integ_features, rownames(human_CTRL), rownames(human_TREAT), more_mouse_genes)
integ_features <- unique(integ_features)
integ_features <- unlist(integ_features)

anchors <- FindIntegrationAnchors(object.list=split_seurat, anchor.features=integ_features)
seurat_integrated <- IntegrateData(anchorset=anchors, features.to.integrate=integ_features)
DefaultAssay(seurat_integrated) <- "integrated"
saveRDS(seurat_integrated, "/home/asd3535/seurat_integrated.rds")

Seurat IntegrateData • 2.5k views

ADD COMMENT • link 3.5 years ago by Aaron ▴ 30

score 1 · Answer 1 · 2022-01-06

1

Entering edit mode

3.5 years ago

jared.andrews07 ★ 19k

Do you need the mouse cells/genes? If not, just remove them and save yourself the headache. Or analyze them separately.

My guess is that there's much more variability between the mouse cells (and genes) due to different cell types being captured compared to the (relatively) homogeneous nature of your PDX. This probably results in your variable feature finding step returning mostly mouse genes, and I can't remember if that assay (scale.data) is used by default for integration or not. Since you're using SCT, you should be specifying that in the FindIntegrationAnchors step with normalization.method = "SCT" as noted in the function reference. That might fix your problem. Or changing the assay used.

It's also not really clear how some of the things in this line are being defined:

integ_features <- c(integ_features, rownames(human_CTRL), rownames(human_TREAT), more_mouse_genes)

ADD COMMENT • link 3.5 years ago by jared.andrews07 ★ 19k

0

Entering edit mode

Thanks @jared.andrews07 these thoughts are very productive. I hope this isn’t too simple a question but if the focus of my project is on the change in immune cells, would it make sense to remove the human cancer cells and only analyze the mouse cells? (The human cancer cells from the treatment group don’t seem to be of great quality as well.) This is how I was doing it initially but after reading more, came to feel that this must be wrong.

ADD REPLY • link 3.5 years ago by Aaron ▴ 30

1

Entering edit mode

If response in the tumor micro environment is the question, then I don't see why not, unless there are changes/correlates in the tumor that are of interest. Though again, just analyzing them separately if needed seems more straightforward, especially for a first pass.