Selecting specific non-consecutive PCs for scRNAseq analysis
1
0
Entering edit mode
2.5 years ago
eturkes • 0

Hi everyone,

I am working with data where we have a short list of genes (less than 50 total, split into 10 or so groups) that we want to use to cluster our data at a broad level before subcluster analysis on each broad cluster. I've been toying with the idea of doing my UMAPs and graph-based clustering using PCs that drive the largest amount of variation for my short list of genes. This was straightforward to implement and I am now evaluating the quality of the results. Meanwhile, I was wondering if anyone can point me to a reference where this approach had been taken before? Or share personal experiences? It's not something I'd seen, but intuitively it makes sense to me if you want supervised clusters in accordance to curated genes. I like using PCs as the selected ones should contain coexpressing genes not in my original list related to the variation, though I am concerned about using non-consecutive PCs, as that seems particularly unconventional.

reduction dimensionality principal scRNAseq components • 873 views
ADD COMMENT
1
Entering edit mode
2.5 years ago
Mensur Dlakic ★ 28k

I have no particular expertise in what you are trying to do.

This strikes me as a very biased approach, and I'd be surprised if it has any future in general use. Picking your own genes to analyze is fine, and you can probably get UMAP embedding just on that subset. The rest of genes can then be embedded based on what was learned from the small subset. That will still carry a bias, but that would be a bias that is implied and understood. I think doing it in a contrived way where one gets to pick and choose which PCs are used is a different kind of bias.

ADD COMMENT
0
Entering edit mode

Thank you for your comment! I didn't know a UMAP embedding can be learnt from a subset like that. The UMAP documentation seems to have a section discussing this, so I'll look it over (https://umap-learn.readthedocs.io/en/latest/supervised.html).

Concerning bias, I'm not sure why its an issue in the first place if I'm intentionally trying to separate my cells based off a restricted set of genes? It just strikes me as sensible if one wants first a broad grouping (i.e. EX neurons, IN neurons, Glia), before going in on those groups for fine-grained analysis.

ADD REPLY
1
Entering edit mode

Concerning bias, I'm not sure why its an issue in the first place if I'm intentionally trying to separate my cells based off a restricted set of genes?

In my opinion, the issue is that cells may not be able to separate based on a restricted set of genes. But if you pick a restricted set of PCs derived from a restricted set of genes, chances are that you can separate them any way you want, because you get to pick which variables work exactly for your intended clustering. That's why I suggested UMAP on a restricted set of genes, because that at least removes the second type of bias.

ADD REPLY
0
Entering edit mode

Ahh I see, yes that makes sense. Thanks for taking the time to explain

ADD REPLY

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6