Hello everyone, I'm into a singlecell RNA-seq analysis using Seurat, I've generated the UMAPs, PCA graphs, worked on the differential expression between the clusters ... And I'm looking at the PCA which is the graphical representation of how much the clusters are similar and I was asking myself the following question: Is there a simple way to compute the similarity rate between two clusters to get a response like cluster 1 is 80% similar to cluster 2?
I was thinking of two ways of doing this:
- Using the data of the PCA graph which are coordinates
- Using the differential expression data (by using FindMarkers)
Hi, would the pseudobulk approach be just to reduce the distances that one has to calculate?
I've been hearing a lot about this pseudobulk idea, but it's still not super clear to me when it should be used
Mostly, yes. It will also reign in outliers. You can take this approach with all cells, and it'll probably work fine, but it may take a good while to run and will require a lot more memory. In addition, the output may not be as clean due to some cells clustering with cells of other similar clusters, etc.
Point being, if you want to compare the clusters, you might as well compare the clusters rather than their constituent elements. Pseudobulking is useful in DE for certain single cell analyses, as the single-cell specific approaches tend to return a lot of false positives due to the sparsity of the data. Pseudobulking makes the use of bulk RNA-seq testing methods appropriate, and they tend to return more robust results for inter-condition contrasts. You can read more about it in this OSCA chapter.
Okay thank you for your response, I'll try using the distance matrix issued of the pseudobulking process, I didn't thought of using it like this.