Question

scrna seq analysis - doublets

2

Entering edit mode

19 months ago

bioinformatics ▴ 60

Hello,

I am trying to perform scrna seq analysis in R using this workflow https://www.singlecellcourse.org/single-cell-rna-seq-analysis-using-seurat.html

I have tried to enter this code, however I do not have the file? Do you know what this file is or how I can create one?

doublets <- read.table("data/update/scrublet_calls.tsv",header = F,row.names = 1)

*Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'data/update/scrublet_calls.tsv': No such file or directory*

I think the purpose of this line of code is to add the doublet annotation generated by scrublet to the Seurat object metadata.

I used patient 1 data from this link: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM4952363

Kind regards,

scrna seq code r • 2.5k views

ADD COMMENT • link updated 19 months ago by Nitin Narwade ★ 1.6k • written 19 months ago by bioinformatics ▴ 60

0

Entering edit mode

google is your friend here.

ADD REPLY • link 19 months ago by fracarb8 ★ 1.7k

0

Entering edit mode

Ok thank you. Is that link to the file used in this code?

ADD REPLY • link 19 months ago by bioinformatics ▴ 60

0

Entering edit mode

Also how do I download this as tsv file?

ADD REPLY • link 19 months ago by bioinformatics ▴ 60

0

Entering edit mode

Yes, you are right. This is a file where you have annotations about cells that are possibly singlets or doublets. You don't want to include doublets that will hamper downstream analysis.

You can run Scrublet on the raw count and the command is very simple. It is a Python-based tool.

There are other options for doublet detection, one of them is scDblFinder, an R-Bioconductor package.

Regards,

Nitin N.

ADD REPLY • link 19 months ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Ok thanks. Does anyone know where I have gone wrong? My table does not show doublet scores.

doublets <- read.table("data/update/scrublet_calls.tsv",header = F,row.names = 1)
colnames(doublets) <- c("Doublet_score","Is_doublet")
srat <- AddMetaData(srat,doublets)
head(srat[[]])


head(srat[[]])

                   orig.ident nCount_RNA nFeature_RNA Doublet_score Is_doublet
AAACCCAAGAGCAGTC-1    pbmc10k      86118         8075            NA       <NA>
AAACCCAAGATAGCTA-1    pbmc10k       1440          840            NA       <NA>
AAACCCAAGGCCACCT-1    pbmc10k        523          391            NA       <NA>
AAACCCAGTAATTAGG-1    pbmc10k      13694         3582            NA       <NA>
AAACCCAGTGACAGCA-1    pbmc10k       8048         2838            NA       <NA>
AAACCCATCACATTGG-1    pbmc10k      38350         5272            NA       <NA>
                   percent.mt percent.rb
AAACCCAAGAGCAGTC-1   6.730300  25.295525
AAACCCAAGATAGCTA-1   1.875000  15.486111
AAACCCAAGGCCACCT-1   3.824092  20.076482
AAACCCAGTAATTAGG-1   5.360012  12.772017
AAACCCAGTGACAGCA-1  20.526839   3.566103
AAACCCATCACATTGG-1   8.453716  24.633638

Also, after quality control I only have 29 cells that passed and are considered good quality. Is it expected to have a higher count of cells that passed the quality control criteria?

table(srat[['QC']])

QC
                     Doublet                      High_MT 
                           4                            1 
        High_MT,Low_nFeature High_MT,Low_nFeature,Doublet 
                           4                            1 
                        Pass 
                          29

ADD REPLY • link 19 months ago by bioinformatics ▴ 60

0

Entering edit mode

1I also ended up with this plot which I don't think is correct from the published paper. The clusters should be myeloid cells 1, myeloid cells 2, plasmocytes, osteoblastic OS cells, Osteoclasts, Endothelial cells, NK/T cells, CAFs, B cells. Do you know why the cell annotation is not working and do you think I should follow a different workflow?

ADD REPLY • link 19 months ago by bioinformatics ▴ 60

0

Entering edit mode

enter image description here

ADD REPLY • link 19 months ago by bioinformatics ▴ 60

0

Entering edit mode

Did you check whether the cell barcodes of your Seurat object are matching with the doublets?

Additionally, can you post the actual code you are using for this analysis, without looking at the code?

ADD REPLY • link 19 months ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

library(Seurat)
library(ggplot2)
library(SingleR)
library(dplyr)
library(celldex)
library(RColorBrewer)
library(SingleCellExperiment)

adj.matrix <- Read10X("GSM4952363_OS_1")

srat <- CreateSeuratObject(adj.matrix,project = "pbmc10k") 

srat

 adj.matrix <- NULL
str(srat)

meta <- srat@meta.data
dim(meta)

head(meta)

summary(meta$nCount_RNA)
summary(meta$nFeature_RNA)

srat[["percent.mt"]] <- PercentageFeatureSet(srat, pattern = "^MT-")

srat[["percent.rb"]] <- PercentageFeatureSet(srat, pattern = "^RP[SL]")

doublets <- read.table("scrublet_calls.tsv",header = F,row.names = 1)
colnames(doublets) <- c("Doublet_score","Is_doublet")
srat <- AddMetaData(srat,doublets)
head(srat[[]])

VlnPlot(srat, features = c("nFeature_RNA","nCount_RNA","percent.mt","percent.rb"),ncol = 4,pt.size = 0.1) & 
  theme(plot.title = element_text(size=10))

FeatureScatter(srat, feature1 = "nCount_RNA", feature2 = "percent.mt")

FeatureScatter(srat, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")

FeatureScatter(srat, feature1 = "nCount_RNA", feature2 = "percent.rb")

FeatureScatter(srat, feature1 = "percent.rb", feature2 = "percent.mt")

FeatureScatter(srat, feature1 = "nFeature_RNA", feature2 = "Doublet_score")

srat[['QC']] <- ifelse(srat@meta.data$Is_doublet == 'True','Doublet','Pass')
srat[['QC']] <- ifelse(srat@meta.data$nFeature_RNA < 500 & srat@meta.data$QC == 'Pass','Low_nFeature',srat@meta.data$QC)
srat[['QC']] <- ifelse(srat@meta.data$nFeature_RNA < 500 & srat@meta.data$QC != 'Pass' & srat@meta.data$QC != 'Low_nFeature',paste('Low_nFeature',srat@meta.data$QC,sep = ','),srat@meta.data$QC)
srat[['QC']] <- ifelse(srat@meta.data$percent.mt > 15 & srat@meta.data$QC == 'Pass','High_MT',srat@meta.data$QC)
srat[['QC']] <- ifelse(srat@meta.data$nFeature_RNA < 500 & srat@meta.data$QC != 'Pass' & srat@meta.data$QC != 'High_MT',paste('High_MT',srat@meta.data$QC,sep = ','),srat@meta.data$QC)
table(srat[['QC']])

VlnPlot(subset(srat, subset = QC == 'Pass'), 
        features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.rb"), ncol = 4, pt.size = 0.1) & 
  theme(plot.title = element_text(size=10))

srat <- NormalizeData(srat)

srat <- FindVariableFeatures(srat, selection.method = "vst", nfeatures = 2000)

top10 <- head(VariableFeatures(srat), 10)
top10 

plot1 <- VariableFeaturePlot(srat)
LabelPoints(plot = plot1, points = top10, repel = TRUE, xnudge = 0, ynudge = 0)

all.genes <- rownames(srat)
srat <- ScaleData(srat, features = all.genes)

srat <- RunPCA(srat, features = VariableFeatures(object = srat))

VizDimLoadings(srat, dims = 1:9, reduction = "pca") & 
  theme(axis.text=element_text(size=5), axis.title=element_text(size=8,face="bold"))

DimPlot(srat, reduction = "pca")

ElbowPlot(srat)

srat <- FindNeighbors(srat, dims = 1:10)

 srat <- FindClusters(srat, resolution = 0.5)

srat <- RunUMAP(srat, dims = 1:10, verbose = F)

table(srat@meta.data$seurat_clusters)

DimPlot(srat,label.size = 4,repel = T,label = T)

all.markers <- FindAllMarkers(srat, only.pos = T, min.pct = 0.5, logfc.threshold = 0.5)

dim(all.markers)

table(all.markers$cluster)

top3_markers <- as.data.frame(all.markers %>% group_by(cluster) %>% top_n(n = 3, wt = avg_log2FC))
top3_markers

monaco.ref <- celldex::MonacoImmuneData()
sce <- as.SingleCellExperiment(DietSeurat(srat))
sce

monaco.main <- SingleR(test = sce,assay.type.test = 1,ref = monaco.ref,labels = monaco.ref$label.main) monaco.fine <- SingleR(test = sce,assay.type.test = 1,ref = monaco.ref,labels = monaco.ref$label.fine)

table(monaco.main$pruned.labels)

table(monaco.fine$pruned.labels)

table(monaco.fine$pruned.labels)

srat@meta.data$monaco.main <- monaco.main$pruned.labels
srat@meta.data$monaco.fine <- monaco.fine$pruned.labels

srat <- SetIdent(srat, value = "monaco.fine")
DimPlot(srat, label = T , repel = T, label.size = 3) + NoLegend()

ADD REPLY • link 19 months ago by bioinformatics ▴ 60

0

Entering edit mode

Thanks.

About doublets, what is the output of the table(rownames(doublets) %in% rownames(srat@meta.data)) command?

About cell type annotation, I would suggest replicating the methods used in the original publication, parameters (number of PCs and resolution), and all. How did they annotate their clusters? Did they use the same reference & SingleR OR used custom markers?

All the best.

Regards,

Nitin N.

ADD REPLY • link 19 months ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

 table(rownames(doublets) %in% rownames(srat@meta.data))

FALSE  TRUE 
10155    39

The paper used the SubsetData function in the Seurat package, and then the FindClusters function (resolution = 0.10) to analyse the different cell types.

ADD REPLY • link 19 months ago by bioinformatics ▴ 60

0

Entering edit mode

There is a serious mismatch between the barcodes in your doublets and the Seurat object. you need to check that and correct one of them.

For the cell type annotation, if the methodology is not clear in the original publication, I would suggest going for Denovo annotation and reference-based. perform analysis at different PC and resolutions, perform DGE between the clusters, and use markers to define the population.

NOTE: Before subset clustering you need to know what you are subsetting and for that, you need to annotate superclusters.

All the best.

Regards,

Nitin N.

ADD REPLY • link 19 months ago by Nitin Narwade ★ 1.6k