Question

Using Seurat function `FindMarkers` to find differentially expressed genes between normal group and treatment group within one specific cell type cluster, but the avg_log2FC results looks weird?

0

Entering edit mode

3.6 years ago

FantasticAI ▴ 60

I have two groups scRNAseq data, and I have finished cell type annotation. Now I would like to find the differentially expressed genes between two condition groups(normal vs treatment) within one cell type cluster, and I used Seurat function FindMarkers as follows hoping to find the DEGs across different conditions (Normal vs Treatment)

Alveolar.macrophages.response <- FindMarkers(normal.vs.Veh, ident.1 = "Alveolar macrophages_Normal", ident.2 = "Alveolar macrophages_Veh", verbose = FALSE)

However, I have some concerns about the returned results from FindMarkers, I used head(Alveolar.macrophages.response[with(Alveolar.macrophages.response, order(avg_log2FC, decreasing = T)), ], 5) to display the results with respect to the decreasing order of avg_log2FC. The avg_log2FC are in general very small, even the largest value of avg_log2FC only up to 0.9793757 which looks very weird to me comparing to what I saw on many other tutorials. I wonder whether this result is reasonable?

                       p_val avg_log2FC pct.1 pct.2     p_val_adj
Tm4sf19        3.630080e-153  0.9793757 0.203 0.079 7.161784e-149
Slfn4           8.873470e-65  0.6886944 0.398 0.290  1.750647e-60
LOC100360087   9.405795e-295  0.6689990 0.935 0.839 1.855669e-290
Fth1           4.259850e-230  0.5818955 0.963 0.896 8.404258e-226
Fkbp5          7.286612e-134  0.5672482 0.406 0.249 1.437576e-129
Hmox1           1.944299e-91  0.5299588 0.291 0.174  3.835907e-87
Zdhhc14         9.329413e-77  0.5016266 0.567 0.455  1.840600e-72

Seurat FindMarkers scRNA • 9.4k views

ADD COMMENT • link updated 3.6 years ago by ATpoint 86k • written 3.6 years ago by FantasticAI ▴ 60

0

Entering edit mode

Do you have biological replicates, if so you could aggregate counts to pseudobulks and use standard tools such as DESeq2. DEG analysis on single-cell level is a mess due to the prevalence of zeros making the fold change estimates biased/useless in my experience, especially for the non-highly expressed genes where zeros are even more prevalent.

ADD REPLY • link 3.6 years ago by ATpoint 86k

0

Entering edit mode

yes, I do have biological replicates in each groups. to aggregate counts, do you mean to use Seurat merge function?

ADD REPLY • link 3.6 years ago by FantasticAI ▴ 60

score 2 · Answer 1 · 2021-06-04

I am not a Seurat user. Something like:

https://hbctraining.github.io/scRNA-seq/lessons/pseudobulk_DESeq2_scrnaseq.html

https://bioconductor.org/books/release/OSCA/multi-sample-comparisons.html#differential-expression-between-conditions => section 14.3.1

For Seurat you would probably need to access the slot with the raw UMI counts, and then sum these per gene for all cells of a condition or group, or whatever your setup is. If you have like a biological replicate (n=2) with two conditions then you would get four pseudobulk, condition1_rep1/2, condition2_rep1/2, so basically a normal cout matrix with four columns, and as many rows as you have genes. Then just feed this into DESeq2, edgeR, you-name-it-tool-for-DEG-analysis :)

There are also conversion functions for Seurat > SingleCellExperiment so with a SCE you can follow the code in the OSCA book to use aggregateAcrossCells() which will do the aggregation for you.

The problem with DEG on single-cell level (each cell is a replicate) is that p-values are hilariously inflated (or deflated?, so very super tiny small, that is what I want to say) because of the large n even if fold changes (effect sizes) are small or tiny, plus the issues with fold changes I commented above. The pseudobulk strategy unlocks the standard DEG tools which are based on many years of biostats research while DEG for single cells is still relatively young, and each cell is not an indepdendent biological replicates, so it makes (to me) little sense especially if you have true biological replicates you could use here.