Question

Statistical test to compare cell type proportions in scrnaseq data

1

Entering edit mode

6 months ago

sarahmanderni ▴ 120

Hi,

I want to compare the proportion of cells of interest (CD24+ cells here) across different time points (ctrl, DM12, DM24 and DM36) within different cell types (T cell1 and T cell2 here) as shown in the figure but not sure what is the optimal approach? I have 2 replicates per time point and the proportion of CD24+ cells is very small compared to overall cells.

I assume I cannot use tools like scCODA and compositional analysis as based on the scCODA paper I would need more replicates for rare cell types such as in my case (recommended 8 to 10 samples if the cell type is rare depending on the statistical power you are looking for). But how about other tests like Kruskal-Wallis following by post-hoc comparisons? Do I have to run any tests in this case?

Thanks!

enter image description here

compositional-analysis scRNA-seq • 635 views

ADD COMMENT • link 6 months ago by sarahmanderni ▴ 120

score 3 · Answer 1 · 2024-06-19

3

Entering edit mode

6 months ago

ATpoint 86k

Please go through the differential abundance analysis section of OSCA https://bioconductor.org/books/release/OSCA.multisample/differential-abundance.html and see whether this is what you need. If not, then please elaborate.

ADD COMMENT • link 6 months ago by ATpoint 86k

0

Entering edit mode

Thanks for the link! It is indeed related to what I am looking for. Still remaining issue is the number of replicates per group. I know the un-official rule of thumb for methods like edgeR to be reliable is having minimum 3 replicates per group. Also in the OSCA link, there are 3 replicates for the example analysis. Do you think I can still safely use similar approach while having 2 replicates? I know it has been said that some methods can also handle less replicate but I cannot rely on their ability to control the errors. Edit: I found this link discussing number of replicates in these methods. Still would be helpful if there is more optimal approach.

ADD REPLY • link 6 months ago by sarahmanderni ▴ 120

1

Entering edit mode

edgeR and company require at minimum 2 vs 1. More of course better, but if you are underpowered, then better use a solid framework like this, rather than cooking custom stats which might perform poorly. You can generate some power by not only testing the CD24+ cells, but all identified celltypes / clusters so the dispersion estimation might get more robust. Just like edgeR normally gains power from sharing information across many genes, rather than testing every gene in isolation 1 by 1.