Question

how to find each cluster in single-cell represent which cell type?

0

Entering edit mode

3.4 years ago

Raz ▴ 10

I have a gene expression matrix and I would like to cluster it and find different cell-types.

Let's suppose we would like to cluster our gene expression matrix (gene* cells) and use one of the clustering methods such as PCA, t-SNE, or whatever. Then let's imagine we have K clusters. Now, we would like to determine each cluster represents which cell types? Thus we find DEGs and sort them. Since DEGs are a list of gene names, after that we should determine the significant genes belong to which clusters, and based on these gene markers we classify the clusters. Am I right? Or is there any other ways?

I can use any clustering methods, but I do know at the end how to classify my clusters. Any help is appreciated.

single-cell DEGs clustering • 2.8k views

ADD COMMENT • link 3.4 years ago by Raz ▴ 10

1

Entering edit mode

I think extracting single components of the PCA (PC1, PC2, ..) will give you a list of the genes contributing the most to "separate" the clusters (in terms of covariance). I do believe that among the highest contributors you would find genes which are cell type-specific.

ADD REPLY • link 3.4 years ago by Marco Pannone ▴ 810

score 4 · Accepted Answer · 2021-11-18

Hey @Raz,

What you posted is pretty much the general outline. The Seurat tutorials could help you abit as well: https://satijalab.org/seurat/articles/get_started.html

In short, you sort of want to optimize your clustering parameters to obtain biologically relevant clusters/cell types. How do you tell if clustering was optimal? You could check the FeaturePlot() for your genes of interest to see if they clustered into their own cluster when comparing to your UMAP of clusters. You should probably do a DGE analysis to see what other markers are co-expressed within those clusters. Sometimes you may have to go back and forth between clustering and FeaturePlot()/DGE analysis - "to optimize clustering to what you're after in the dataset". If you are using Seurat, FindAllMarkers() will essentially do a comparison between each cluster vs all other clusters, and it will do this for every cluster. Once you have preliminary DGE analysis you could proceed to optimize clustering, and/or again if you know the unique markers for major cell types you could just visualize it using FeaturePlot() and compare it to your UMAP of your preliminary clustering. See the biostars post here: General question about clustering in scRNAseq . Say you find that you have one large Mesenchymal group cell population/cluster in UMAP space that you want to subcategorize into subpopulations. You could change/play around with your clustering parameters to "break apart" the large Mesenchymal cluster into subpopulations. In Seurat: specifically you can increase res in FindClusters() and also maybe dims as well to increase the amount of Principle Components (PCs) - you should play around with the other clustering parameters, but I found playing around with/increasing res does the trick most of the time. You can rerun RunUMAP() and DimPlot() to visualize the clustering again after each change in parameter to sort of guess and check to see if clustering is now optimal. Finally, you could then then use FindMarkers() setting ident.1 to the one mesenchymal cluster and ident.2 to the rest of the mesenchymal "sub"clusters, and find the DEGs that are unique within the subpopulations of mesenchymal cells to classify cell subtypes (ie rather than saying this large group is mesencyhmal cells, you could be like "this mesenchymal subpopulations are +PDGRa fibroblasts, this other mesenchymal subpopulation are +ACTA2 myofibroblasts, etc). Some folks even go as far as detailing the location-specific markers of the the cell types, based on where they isolated the tissue for scRNA-seq from. Alot of this is back forth moving up and down the pipeline until you find the "perfect fit" for what you're after. Just as an addition once your clusters/cell types are optimized and hopefully identified, there are tools to help figure out signalling interactions between different clusters as well if you're interested in that as well. I have not used any, but there are also tools that will do the cell type annotation for you granted, I beleive, that you optimized clustering. I think the cell type annotations tools are particularly useful for immune cell annotation, cause I feel like there are just soo many immune cell subtypes!!

Hope this helps!