Question

For Mean Expression/FC Calculations in scRNA-seq, should I use All cells or only Expressed Cells?

0

Entering edit mode

6.5 years ago

achits ▴ 20

I'm doing a differential test for monocle and they show that differentialGeneTest() gives the features that are different between your model but doesn't tell you about which specific genes go up for particular groups. Per there documentation, they state "We could also simply compute summary statistics such as mean or median expression level on a per-CellType basis to see this, which might be handy if we are looking at more than a handful of genes."

This makes sense and I have a calculated normalized expression matrix, my main question is does one normally use all single cells to calculate the mean expression, including the cells that have no detectable level or just expressed cells? So for example, a scenario were condition 1 has 400 total cells and 300 cells express geneA and Condition 2 has 200 total cells and only 50 express geneA. If I'm calculating a FC for geneA do I compare

meanexpression(400 TOTAL cells)/meanexpression(200 TOTAL cells) OR

meanexpression(300 EXPRESSING cells)/mean(50 EXPRESSING cells).

I can see how there would be bias in both and so I wonder which is used in the field?

scRNA-seq Monocle next-gen • 2.7k views

ADD COMMENT • link updated 6.5 years ago by Charles Warden 8.3k • written 6.5 years ago by achits ▴ 20

score 0 · Answer 1 · 2018-05-07

It is probably a good idea to do some extra QC filtering (such as for cells with a minimum number of covered genes, and cells with a sufficiently low percentage of mitochondrial reads), but the criteria that can/should be applied will likely vary between projects.

I'm not sure how easy it is to do this with Monocle (or what specific functions to recommend). However, some other potential options would be:

1) Use direct counts for p-values (and use relatively standard RNA-Seq methods like edgeR / limma-voom, or you may be able to try some scRNA-Seq specific methods like MAST), and use CPM values for calculating fold-changes (or some other normalized count, if the goal is to have something to compare to what is provided by the differential expression program)

2) Use Seurat scaled expression for the fold-change calculation, and potentially use standard statistical tests (like lm() for linear-regression, aov() for ANOVA, etc.) to compare differential expression between groups of cells.