Question

Inferring cell identity/genotype in single cells with missing information

0

Entering edit mode

9 months ago

txema.heredia ▴ 200

Hi,

I have been asked to analyze a single cell dataset as follows:

Mouse; WT vs Fib4 mutant ; 4 samples (2+2), 1xM+1xF in each genotype.
2 sequencing runs with 10x, superloading the runs with 2 biological samples each using a hashtag antibody:
- run #1:
  - sample 1: Male; Fib4 mutant
  - sample 2: Female ; WT
- run #2:
  - sample 3: Female; Fib4 mutant
  - sample 4: Male ; WT

Unfortunately, the antibody reaction didn't work well after sequencing (it looked fine in the wet lab pre-sequencing, though). I've been left with 2 "samples" (12k + 6k cells) that combine cells of both sexes and both phenotypes. I am trying to salvage what I can from the analysis.

As each run was composed of one Male and one Female sample, I used the expression level of sex-specific genes (F: Xist vs M: Ddx3y, Eif2s3y, Kdm5d, Uty) to classify cells into each sample-of-origin. I used the raw counts of both groups of genes and classified them into a sex if they had >0 reads. This resulted in 54% sex-classified cells for run #1, and 69% for run #2

run	both	F	M	none
#1	130	2434	4187	5387
#2	95	3273	954	1796

Knowing this (VERY IMPERFECT) classification, I was able to assign a genotype to those cells.

From there, I merged both sequencing runs into a single Seurat object with 18k cells classified by genotype:

Fib4	WT	NA
7460	3388	7408

The Fib4 mutants are a mouse model of frailty. Because of this, either the tissue/cell composition of the original samples, or the ability of cells to survive tissue dissociation is different between genotypes. After a first round of naive clustering, I can see clear differences in the abundance of WT/Fib4 cells on several clusters

tSNE

And there are several clusters dominated by cells with no assigned genotype

table clusters

Because of this, I am trying to find some way to classify (as many as possible of) the NA cells into one of the two genotypes. What I have tried up to now is:

Select the largest cluster with the highest number of both WT and Fib4 cells (cluster 0).
Run FindMarkers on the cluster to detect markers that can distinguish the two genotypes.
Use the top up and down markers to create a WT.score and Fib4.score and run AddModuleScore with those gene lists on the whole dataset.
Classify cells according to those 2 scores.

ss<-subset(seu,subset = seurat_clusters == 0)
Idents(ss) <- "gt"
# gt_cl0_markers <- FindMarkers(ss, ident.1 = "Fib4", ident.2 = "WT" )
gt_cl0_markers <- FindMarkers(ss, ident.1 = "Fib4", ident.2 = "WT", logfc.threshold = 0.25, test.use = "roc", only.pos = F)
gt_cl0_up <- rownames( gt_cl0_markers[gt_cl0_markers$avg_log2FC > 0 ,] %>% top_n(5, power) ) gt_cl0_down <- rownames( gt_cl0_markers[gt_cl0_markers$avg_log2FC < 0 ,] %>% top_n(5, power) )
gt_cl0_markers$dir <- ifelse(gt_cl0_markers$avg_log2FC >= 0, "up", "down")

ss <- AddModuleScore(ss, features=list(gt_cl0_up), name="seu_fib4_cl0_up", assay="RNA", slot="data") ss <- AddModuleScore(ss, features=list(gt_cl0_down), name="seu_fib4_cl0_down", assay="RNA", slot="data")

md<-ss@meta.data

ggplot(md, aes(x=seu_fib4_cl0_up1, y=seu_fib4_cl0_down1)) + geom_abline(slope=1,linetype="dashed")+ geom_hline(yintercept = 0,linetype="dashed")+ geom_vline(xintercept = 0,linetype="dashed")+ geom_point(alpha=0.25, aes(color=gt)) + facet_wrap(~gt)+ theme_minimal() + theme(aspect.ratio = 1) + ggtitle("Genotype") + guides(color=guide_legend(override.aes = list(alpha=1)))

Unfortunately, these scores created from the cluster/genotype markers doesn't seem able to classify much:

scores all cells

And they don't even classify much when applied only on the very same cluster used to find the markers:

scores cluster 0

I have tried this using both the default method to FindMarkers and the test.use = "roc" one. Using the top 5, 20, and 50 markers in each direction.

Is this the right way to infer the genotype/grouping of cells with missing information? Am I doing something wrong? How should I classify cells based on these differentially expressed genes? Am I doing everything fine but I am out of luck with these samples?

Thanks, Txema

cell seurat single score • 294 views

ADD COMMENT • link 9 months ago by txema.heredia ▴ 200