Question

Why tSNE and UMAP give ill-defined and unclear clusters result?

0

Entering edit mode

5.9 years ago

wayj86 ▴ 40

Hi,

I am using Seurat 3.1 to integrate my 11 samples (2 Knock-out, 3 wild type 3 Knock-in and 3 Overexpression) via standard workflow. The results of UMAP seemed ill-defined and unclear:

enter image description here

In another attempt, I also tried tSNE, but the result also look weird:

enter image description here

I tried to set different dims and the result didn't improve. So could you please tell me how improve the result of tSNE and UMAP when using Seurat 3.1? Thanks a million in advance.

Stanley

RNA-Seq • 5.5k views

ADD COMMENT • link 5.9 years ago by wayj86 ▴ 40

0

Entering edit mode

Not enough information to really answer. How are you integrating? Using Seurat's method or one of the wrappers? How many cell types would you expect based on your sample? Are you doing any differentiation? If so, the oddities in the UMAP structure would make more sense.

ADD REPLY • link 5.9 years ago by jared.andrews07 ★ 19k

0

Entering edit mode

Thanks a lot for your answer. I expect 6-7 major cell types. The script I used to integrate my data was listed below:

 options(stringsAsFactors = FALSE)
    library(Seurat)
    library(dplyr)
    library(ggplot2)
    library(cowplot)
    #For KIHO66
    KIHO66.data <- Read10X(data.dir = "./")
    KIHO66 <- CreateSeuratObject(counts = KIHO66.data, min.cells = 3, min.features = 200, project = "KIHO66_geneX") #23117 features across 7691 samples within 1 assay
    KIHO66 <- RenameCells(KIHO66, add.cell.id = "KIHO66")
    KIHO66[["percent.mt"]] <- PercentageFeatureSet(KIHO66, pattern = "^mt-")
    pdf("KIHO66_1.pdf")
    VlnPlot(KIHO66, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
    dev.off()
    pdf("KIHO66_2.pdf")
    plot1 <- FeatureScatter(KIHO66, feature1 = "nCount_RNA", feature2 = "percent.mt")
    plot2 <- FeatureScatter(KIHO66, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
    CombinePlots(plots = list(plot1, plot2))
    dev.off()
    KIHO66 <- subset(KIHO66, subset = nFeature_RNA > 200 & nFeature_RNA < 9000 & percent.mt < 5) #23117 features across 6087 samples within 1 assay 
    KIHO66 <- NormalizeData(KIHO66, normalization.method = "LogNormalize", scale.factor = 10000)
    KIHO66 <- FindVariableFeatures(KIHO66, selection.method = "vst", nfeatures = 2000)
    KIHO66@meta.data$sample <- "KIHO66"
    KIHO66@meta.data$treatment <- "KI"

..And another 10 samples...
#Integration
reference.list <- list(KIHO66, KIHO82, KIHO96, KOHO37, KOHO41, WT23, WT26, WT84)
geneX.anchors <- FindIntegrationAnchors(object.list = reference.list, dims = 1:30)
geneX.integrated <- IntegrateData(anchorset = geneX.anchors, dims = 1:30)
DefaultAssay(geneX.integrated) <- "integrated"
geneX.integrated <- ScaleData(geneX.integrated, verbose = FALSE)
geneX.integrated <- RunPCA(geneX.integrated, npcs = 30)
geneX.integrated <- RunTSNE(object = geneX.integrated, dims.use = 1:30, do.fast = TRUE)
geneX.integrated <- FindNeighbors(geneX.integrated, reduction = "pca", dims = 1:30)
geneX.integrated <- FindClusters(geneX.integrated, resolution = 0.4)
p1 <- DimPlot(geneX.integrated, reduction = "tsne", group.by = "treatment")
p2 <- DimPlot(geneX.integrated, reduction = "tsne", label = TRUE)
plot_grid(p1, p2)
DimPlot(geneX.integrated, reduction = "tsne", split.by = "treatment", label = TRUE)

ADD REPLY • link 5.9 years ago by wayj86 ▴ 40

score 3 · Answer 1 · 2019-12-19

3

Entering edit mode

5.9 years ago

Jean-Karim Heriche 27k

Without access to the data, we can't do much to help. With these algorithms, setting the correct parameters is important. For t-SNE, the main one is perplexity, for UMAP, the main parameters are the number of neighbours and to a lesser degree the minimum distance. I suggest you become familiar with the algorithms and their parameters. This page on how to use t-SNE effectively can maybe help you. For UMAP, try this page on understanding UMAP which also compares it with t-SNE. Also read the UMAP documentation.