Statistical test to compare cell type proportions in scrnaseq data
1
1
Entering edit mode
6 months ago
sarahmanderni ▴ 120

Hi,

I want to compare the proportion of cells of interest (CD24+ cells here) across different time points (ctrl, DM12, DM24 and DM36) within different cell types (T cell1 and T cell2 here) as shown in the figure but not sure what is the optimal approach? I have 2 replicates per time point and the proportion of CD24+ cells is very small compared to overall cells.

I assume I cannot use tools like scCODA and compositional analysis as based on the scCODA paper I would need more replicates for rare cell types such as in my case (recommended 8 to 10 samples if the cell type is rare depending on the statistical power you are looking for). But how about other tests like Kruskal-Wallis following by post-hoc comparisons? Do I have to run any tests in this case?

Thanks!

enter image description here

compositional-analysis scRNA-seq • 638 views
ADD COMMENT
3
Entering edit mode
6 months ago
ATpoint 86k

Please go through the differential abundance analysis section of OSCA https://bioconductor.org/books/release/OSCA.multisample/differential-abundance.html and see whether this is what you need. If not, then please elaborate.

ADD COMMENT
0
Entering edit mode

Thanks for the link! It is indeed related to what I am looking for. Still remaining issue is the number of replicates per group. I know the un-official rule of thumb for methods like edgeR to be reliable is having minimum 3 replicates per group. Also in the OSCA link, there are 3 replicates for the example analysis. Do you think I can still safely use similar approach while having 2 replicates? I know it has been said that some methods can also handle less replicate but I cannot rely on their ability to control the errors. Edit: I found this link discussing number of replicates in these methods. Still would be helpful if there is more optimal approach.

ADD REPLY
1
Entering edit mode

edgeR and company require at minimum 2 vs 1. More of course better, but if you are underpowered, then better use a solid framework like this, rather than cooking custom stats which might perform poorly. You can generate some power by not only testing the CD24+ cells, but all identified celltypes / clusters so the dispersion estimation might get more robust. Just like edgeR normally gains power from sharing information across many genes, rather than testing every gene in isolation 1 by 1.

ADD REPLY
0
Entering edit mode

This totally makes sense. Thanks you!

ADD REPLY

Login before adding your answer.

Traffic: 2000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6