Identifying Immune Cell Subsets - Favour Reference-Based or Marker-Based Label
1
1
Entering edit mode
2.9 years ago
kebl4660 ▴ 10

Hi,

I have been trying different methods to identify immune sub-populations.

For cell identification in general, the most commonly used method in papers seems to be SingleR, others cluster and define different clusters based on the most HVGs.

Since there is no gold-standard to this date, I have tried SingleR with different references (fine labels), ProjectTIL, scPred, and identifying markers for different clusters - but across these methods none agrees! For major labels this is less of a problem.

So for example, how can I justify, or any paper as a matter of fact, that a cell is a T Cell CD4 Effector Memory if across all these methods none of the labels agree? Do people just decide to base cluster or cell annotation using the method that fits their bias best? Would be great to hear some opinions to know how to move forward with cell subtype identification in my data. Thank you!

scrna • 1.2k views
ADD COMMENT
2
Entering edit mode
2.9 years ago

It's a tricky topic for sure, and rare/uncertain subtype annotations across datasets conflict all the time. In reality, there is no easy answer, other than finding a reference dataset that you trust/agree with and using that, or coming up with a marker list that works well for your celltype(s) of interest.

SingleR (and other such methods) work well for less fine-grained labels, but I still end up looking through marker lists and the literature more than I'd like. My hope is that as these ginormous datasets keep getting smashed together and all, somebody will try to ameliorate some of these issues by re-annotating and making available solid marker lists.

ADD COMMENT
1
Entering edit mode

I think the annotation tools are great starting points - I have been using them to look at the distributions along with the dimensional reduction - here is an example (below) I made using the ProjecTIL method. Then I will look at the relative assignments by clusters (relative proportions) and finally as @jared.andrews07 mentions, using a manual annotation with canonical markers. For me the annotation step is the most time-consuming of most projects. enter image description here

ADD REPLY
0
Entering edit mode

Thank you @jared.andrews07 and @theHumanBorch for taking the time to reply! Especially since I am currently writing my PhD thesis and heavily citing SingleR and scRepertoire!

One of my worry with marker-based, even if I have a strong marker list for my cells of interest, is that I may be forcing labels onto cells that may have clustered together based on state and not subtype (e.g. CD4 Effector Memory would cluster with CD8 Effector Memory or Anergic CD4 with Anergic CD8). Using reference-based, on the other hand, the reference itself introduces a bias, so looking at the same dataset different people trusting different references will come up with different subtypes.

Indeed, this will hopefully improve over time and until now I've also been using a combination of methods, similar to what @theHumanBorch is showing, but I find it hard to tell which methods or even references should be given more weight when trying to narrow down these fine-grained labels.

ADD REPLY
1
Entering edit mode

Not a whole lot to do for it - though if you can tie canonical markers (e.g. from flow/IHC that the field typically accepts as truth) to reference data, that can help you pinpoint which are decent.

Like Nick said, there's going to be a manual component, though I'd be inclined to give projectTIL a try based on what Nick showed above - I struggled to get labeling that clear for T cell subtypes with most immune references (even if they had labels for such populations).

ADD REPLY

Login before adding your answer.

Traffic: 2999 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6