Question

Seurat DefaultAssay "integrated" or "RNA" with integrated dataset????

3

Entering edit mode

5.6 years ago

cook.675 ▴ 240

I have a vehicle vs. treatment dataset that I am working through and came across the following in the Seurat vignette that I don't quite understand and wanted to ask some questions about:

Here is the vignette : https://satijalab.org/seurat/v3.1/immune_alignment.html

Here is the section of code I would like to focus on:

DefaultAssay(immune.combined) <- "integrated"
#Run the standard workflow for visualization and clustering
immune.combined <- ScaleData(immune.combined, verbose = FALSE)
immune.combined <- RunPCA(immune.combined, npcs = 30, verbose = FALSE)
# t-SNE and Clustering
immune.combined <- RunUMAP(immune.combined, reduction = "pca", dims = 1:20)
immune.combined <- FindNeighbors(immune.combined, reduction = "pca", dims = 1:20)
immune.combined <- FindClusters(immune.combined, resolution = 0.5)

Right after the integration steps and before the clustering steps, the DefaultAssay is changed to "integrated".

What is this assay and how/why is it different than "RNA".
What are the consequences of NOT changing it and leaving RNA as the default
I noticed that when I leave my DefaultAssay as RNA and do not invoke command that the software finds more DE genes
in the downstream FindMarkers analysis. If I leave the default assay as RNA will I get the same results just less genes?
Is it nessecary to change the DefaultAssay to "integrate" for a dataset like mine comparing two treatments that are
integrated?

Thank you!

seurat • 39k views

ADD COMMENT • link updated 5.6 years ago by shoujun.gu ▴ 350 • written 5.6 years ago by cook.675 ▴ 240

score 5 · Answer 1 · 2019-09-22

5

Entering edit mode

5.6 years ago

shoujun.gu ▴ 350

data integration process will return a matrix with "corrected" value. Set DefaultAssay to "integrated" means your following analysis will on the "corrected" value. Set DefaultAssay to "RNA" means your following analysis will on the original value.
see above.
You won't get same results, since you are analyzing two different data.
You can use integrated data to do clustering. But it is not proper to use integrated for DE, and most tools only accept raw counts for DE.

ADD COMMENT • link 5.6 years ago by shoujun.gu ▴ 350

0

Entering edit mode

You can use integrated data to do clustering. But it is not proper to use integrated for DE, and most tools only accept raw counts for DE.

So we switch to the integrated assay for the dimensional analysis and clustering, but switch back to using RNA assay (counts) to locate cluster biomarkers (DE for clusters) and DE by treatment group (within cluster)? This is what I gather from your response and also looking back over the vignette. I just want to be sure.

Thanks so much

ADD REPLY • link 5.6 years ago by cook.675 ▴ 240

4

Entering edit mode

This issue is addressed by Seurat developers:

ADD REPLY • link 5.6 years ago by igor 13k

0

Entering edit mode

Thank you so much this is very helpful

ADD REPLY • link 5.6 years ago by cook.675 ▴ 240

score 4 · Answer 2 · 2019-09-22

What is this assay and how/why is it different than "RNA".

This is the data after integration. You can think of it as batch-adjusted data.

What are the consequences of NOT changing it and leaving RNA as the default

If you do not switch to integrated assay, you will not be working with integrated data. If you are not interested in the integrated data, then you don't need to perform integration. If you just want to combine two Seurat objects without any additional adjustments, there a merge function and a vignette for that workflow.

I noticed that when I leave my DefaultAssay as RNA and do not invoke command that the software finds more DE genes in the downstream FindMarkers analysis. If I leave the default assay as RNA will I get the same results just less genes?

Different "assays" have different data that will lead to different results.

Is it nessecary to change the DefaultAssay to "integrate" for a dataset like mine comparing two treatments that are integrated?

In the linked tutorial, they are integrating a treated and an untreated sample. The goal is to find analogous subpopulations (such as B or T cells) within the two datasets. Since those exist in both datasets, it makes more sense for them to cluster together. The integrated representation shows the conserved aspects of the data. Once they identify the subpopulations, they then perform differential expression between the different treatments within each of them.