Why tSNE and UMAP give ill-defined and unclear clusters result?
1
0
Entering edit mode
5.0 years ago
wayj86 ▴ 40

Hi,

I am using Seurat 3.1 to integrate my 11 samples (2 Knock-out, 3 wild type 3 Knock-in and 3 Overexpression) via standard workflow. The results of UMAP seemed ill-defined and unclear:

enter image description here

In another attempt, I also tried tSNE, but the result also look weird:

enter image description here

I tried to set different dims and the result didn't improve. So could you please tell me how improve the result of tSNE and UMAP when using Seurat 3.1? Thanks a million in advance.

Stanley

RNA-Seq • 4.9k views
ADD COMMENT
0
Entering edit mode

Not enough information to really answer. How are you integrating? Using Seurat's method or one of the wrappers? How many cell types would you expect based on your sample? Are you doing any differentiation? If so, the oddities in the UMAP structure would make more sense.

ADD REPLY
0
Entering edit mode

Thanks a lot for your answer. I expect 6-7 major cell types. The script I used to integrate my data was listed below:

 options(stringsAsFactors = FALSE)
    library(Seurat)
    library(dplyr)
    library(ggplot2)
    library(cowplot)
    #For KIHO66
    KIHO66.data <- Read10X(data.dir = "./")
    KIHO66 <- CreateSeuratObject(counts = KIHO66.data, min.cells = 3, min.features = 200, project = "KIHO66_geneX") #23117 features across 7691 samples within 1 assay
    KIHO66 <- RenameCells(KIHO66, add.cell.id = "KIHO66")
    KIHO66[["percent.mt"]] <- PercentageFeatureSet(KIHO66, pattern = "^mt-")
    pdf("KIHO66_1.pdf")
    VlnPlot(KIHO66, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
    dev.off()
    pdf("KIHO66_2.pdf")
    plot1 <- FeatureScatter(KIHO66, feature1 = "nCount_RNA", feature2 = "percent.mt")
    plot2 <- FeatureScatter(KIHO66, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
    CombinePlots(plots = list(plot1, plot2))
    dev.off()
    KIHO66 <- subset(KIHO66, subset = nFeature_RNA > 200 & nFeature_RNA < 9000 & percent.mt < 5) #23117 features across 6087 samples within 1 assay 
    KIHO66 <- NormalizeData(KIHO66, normalization.method = "LogNormalize", scale.factor = 10000)
    KIHO66 <- FindVariableFeatures(KIHO66, selection.method = "vst", nfeatures = 2000)
    KIHO66@meta.data$sample <- "KIHO66"
    KIHO66@meta.data$treatment <- "KI"

..And another 10 samples...
#Integration
reference.list <- list(KIHO66, KIHO82, KIHO96, KOHO37, KOHO41, WT23, WT26, WT84)
geneX.anchors <- FindIntegrationAnchors(object.list = reference.list, dims = 1:30)
geneX.integrated <- IntegrateData(anchorset = geneX.anchors, dims = 1:30)
DefaultAssay(geneX.integrated) <- "integrated"
geneX.integrated <- ScaleData(geneX.integrated, verbose = FALSE)
geneX.integrated <- RunPCA(geneX.integrated, npcs = 30)
geneX.integrated <- RunTSNE(object = geneX.integrated, dims.use = 1:30, do.fast = TRUE)
geneX.integrated <- FindNeighbors(geneX.integrated, reduction = "pca", dims = 1:30)
geneX.integrated <- FindClusters(geneX.integrated, resolution = 0.4)
p1 <- DimPlot(geneX.integrated, reduction = "tsne", group.by = "treatment")
p2 <- DimPlot(geneX.integrated, reduction = "tsne", label = TRUE)
plot_grid(p1, p2)
DimPlot(geneX.integrated, reduction = "tsne", split.by = "treatment", label = TRUE)
ADD REPLY
3
Entering edit mode
5.0 years ago

Without access to the data, we can't do much to help. With these algorithms, setting the correct parameters is important. For t-SNE, the main one is perplexity, for UMAP, the main parameters are the number of neighbours and to a lesser degree the minimum distance. I suggest you become familiar with the algorithms and their parameters. This page on how to use t-SNE effectively can maybe help you. For UMAP, try this page on understanding UMAP which also compares it with t-SNE. Also read the UMAP documentation.

ADD COMMENT
0
Entering edit mode

Thank you for your nice advice.

ADD REPLY

Login before adding your answer.

Traffic: 2682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6