Question

Best choices for DGE and pathway enrichment analysis in single cell data using pseudobulk?

0

Entering edit mode

17 days ago

txema.heredia ▴ 210

Hi,

I am analyzing a single cell dataset. It is a mouse model of disease. 2 WT/control + 2 mutant/disease samples. I have already run the usual QC, clustering, cell type annotation, trajectory analysis, etc.

We have found a cell type that is only present in the disease samples. With multiple subtypes and some relationship with other "healthy" cell types (present still in the same disease samples). We want to further look into these cells and try to find the DEG and pahtways associated with these clusters.

I have previously looked into Seurat's FindMarkers and some single-cell methods for pathway enrichment, with disappointing results.

AFAIK, the consensus is that the best results for single cell data can be obtained by transforming the single cell data into pseudobulk and running "traditional" bulk RNA-seq methods on those pseudosamples.

I have been following this paper Confronting false discoveries in single-cell differential expression, where it seems that the best results are obtained by either edgeR or DESeq2, both using their LRT methods. However, as far as I understand, this paper benchmarked perturbation analyses across cell types. I am more interested, however, in DEG between specific cell types (pairwise) within the same samples.

Data

The data I have:

11 clusters. 3 cell types (A,B,C) with 3, 6, and 2 subtypes respectively.
- I tested this on just 2 comparisons, A1 vs A2, and B1 vs B2.
2 disease samples/biological replicates. I won't use the WT/controls because they lack our celltype of interest B.
pseudosamples generated just adding up the raw counts with Matrix.utils::aggregate.Matrix(... , fun="sum")

Parameters:

The different parameters I tested:

Software + method:
- DESeq2
  - Wald test
  - LRT
- edgeR.
  - exact test
  - QLF
  - LRT
design:
- ~cluster
- ~cluster + sample
Samples in object:
- Only the 2+2 pseudosamples (2 clusters * 2 biological samples) for the clusters in the comparison.
- All 11 clusters * 2 bio samples (22 total) <- I did this only for DESeq2 because I don't know how to do it in edgeR.
Reduced model:
- when using LRT with a ~cluster + sample design:
  - reduced = ~1
  - reduced = ~sample <- I did this only for DESeq2 because I don't know how to do it in edgeR.
logFC Shrinkage:
- none
- yes
  - DESeq2: lfcShrink(..., type="apeglm", ...)
  - edgeR: prior.count=5

Pathway enrichment analysis:
- fgsea (using log2FC as "ranking" variable in all genes, not only significant DEG)
  - GO:BP
  - GO:MF

(I have used DESeq2 a lot, but this is my first time using edgeR, so I might be doing something wrong.)

Results

And this is the number of significant up/down-DEG and enriched pathways in each method/combination of parameters:

                                                                                                          A1 vs A2                                    B1 vs B2
                                                                                               DEG          GO:bp         GO:mf            DEG          GO:bp         GO:mf
package   data in object    log2FC shrink.   method                  design               Down     Up   Down     Up   Down     Up     Down     Up   Down     Up   Down     Up
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DESeq2    all               apeglm           LRT reduced=~1          ~cluster              3212   3053     33     68     33     35|     175    232     24     83      7     19
DESeq2    all               apeglm           LRT reduced=~1          ~cluster + sample     3606   3502     31     60     30     31|     754    569     30     76     10     17
DESeq2    all               apeglm           LRT reduced=~sample     ~cluster + sample     3517   3419     33     63     31     31|     754    569     30     79      9     18
DESeq2    all               apeglm           Wald                    ~cluster              1338   1170     33     70     32     33|     175    232     24     84      7     18
DESeq2    all               apeglm           Wald                    ~cluster + sample     2052   1657     30     61     32     29|     754    569     34     83     10     19
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DESeq2    only comparison   apeglm           LRT reduced=~1          ~cluster               747    706     37    128     25     44|      68    163     14     76      8     22
DESeq2    only comparison   apeglm           LRT reduced=~1          ~cluster + sample      808    457     36    122     27     43|      79    104     13     65      7     21
DESeq2    only comparison   apeglm           LRT reduced=~sample     ~cluster + sample      906    620     35    117     26     41|      57    127     12     61      7     21
DESeq2    only comparison   apeglm           Wald                    ~cluster               748    679     36    126     30     44|      68    160     13     72      7     22
DESeq2    only comparison   apeglm           Wald                    ~cluster + sample      907    619     33    125     26     41|      63    131     13     65      9     24
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
DESeq2    only comparison   no               LRT reduced=~1          ~cluster               747    706      8      3     11      4|      75    163     55      4      8      0
DESeq2    only comparison   no               LRT reduced=~1          ~cluster + sample      812    459      5      2      4      2|      87    107     62      6     10      0
DESeq2    only comparison   no               LRT reduced=~sample     ~cluster + sample      906    620      5      2      6      2|      60    127     65      7     10      0
DESeq2    only comparison   no               Wald                    ~cluster               748    679      8      3     12      4|      75    160     59      6      8      0
DESeq2    only comparison   no               Wald                    ~cluster + sample      907    619      5      2      4      2|      68    131     65      7     10      0
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
edgeR     only comparison   no               exact                   ~cluster               685   1545      5      2      3      5|     328    396     51      1      7      3
edgeR     only comparison   no               exact                   ~cluster + sample     1723   2369      5      3      3      5|     561    750     38      1      5      3
edgeR     only comparison   no               LRT                     ~cluster               793   1785      6      2      3      5|     118    178     58      1      7      3
edgeR     only comparison   no               LRT                     ~cluster + sample     1782   2556      5      3      2      5|     110    215     43      1      5      3
edgeR     only comparison   no               QLF                     ~cluster               441   1139      6      2      3      5|     196    352     52      1      6      3
edgeR     only comparison   no               QLF                     ~cluster + sample     1539   2076      5      2      3      5|     471    761     51      1      7      3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
edgeR     only comparison   prior.count=5    exact                   ~cluster               685   1545     19     25     13     16|     328    396     45      4     17      6
edgeR     only comparison   prior.count=5    exact                   ~cluster + sample     1723   2369     22     36     18     19|     561    750     37      3     16      8
edgeR     only comparison   prior.count=5    LRT                     ~cluster               793   1785     21     30     13     16|     118    178     53      6     14      6
edgeR     only comparison   prior.count=5    LRT                     ~cluster + sample     1782   2556     18     28     14     17|     110    215     39      3     16      6
edgeR     only comparison   prior.count=5    QLF                     ~cluster               441   1139     21     30     17     17|     196    352     39      3     18      6
edgeR     only comparison   prior.count=5    QLF                     ~cluster + sample     1539   2076     17     26     13     17|     471    761     45      3     13      6

I notice several trends:

The overlap of DEGs is pretty decent across all methods (data not shown). Most differences are driven by some combinations of parameters reporting a widely different number of DEG. However, the genes tend to be "in line".
- The overlap of down-regulated genes can be weaker when comparing results from DESeq2 and edgeR. The up-regulated genes show better overlaps.
- Similarly, the overlap of pathways looks good, with differences depending on the number reported.
Using all samples in the data object has a big effect in the A1 vs A2 comparison, but very little in the B1 vs B2. The pathways in A1 vs A2 are useless, with many terms too general and not related at all with the tissue/cell type studied.
When using only the 2 cell types of interest in the data object, edgeR returns more DEG than DESeq2.
Method:
- DESeq2: LRT returns more DEG than Wald when using all pseudosamples in the data object, but an almost identical number when using only those of the specific comparison.
- edgeR: when using ~cluster as the design, LRT > exact > QLF. However, when including sample as the covariate, the differences become much smaller.
Using logFC shrinkage allows fgsea to report more significantly enriched pathways. This is not surprising, as I am using logFC to rank genes there. Moving the log2FC of lowly-expressed genes towards 0 reduces the noise in fgsea and returns "better" pathways.
- However, in the B1 vs B2 comparison, using shrinkage reduced the number of pathways in both DESeq2 and edgeR.
The choice of design/covariates has a big impact on DESeq2 when using all pseudosamples, but very small when using only the comparison of choice.
- using ~cluster + sample has a much bigger effect in the number of DEG inedgeR than in DESeq2. However, it has almost no effect in the pathways.
edgeR returns many more pathways than DESeq2 in the B1 vs B2 comparison. These 2 subtypes are much more similar than A1vsA2.

Questions:

Do you see anything striking in these results?

We are mostly interested in studying the differences of B, from the context of A. A1 and A2 are cell types already present in healthy samples (we saw that 100% in our data), while B is a cell type/profile derived from A and caused by disease. We want to test if B1 and B2 clusters are derived from "diseased" A1 and A2 cells respectively. We want to see if the differences (DEG, pahtways) between B1 and B2 are similar to the differences between A1 and A2. As well as infer what is happening in cells of B.

Seeing the different results of the different methods/parameters in A1vsA2 and B1vsB2 comparisons, which would you advise us to chose?

Why is edgeR reporting so many DEG compared to DESeq2 in the B1vsB2 comparison? Is is more sensitive to small differences? Is it more prone to false positives?
Should we use all (22) pseudosamples in the data object, or only the 2+2 of the comparison of interest?
Should we use a simple design ~cluster, or is it advisable to use ~cluster + sample for pseudobulk analyses trying to find differences between cell types? One of our samples always has more cells (of all cell types) than the other. Should we also include the number of cells per sample as covariates?
Is it worth using LRT methods over Wald/Exact for simple pairwise comparisons? Is it more correct to use reduced=~1 or reduced=~sample in this type of analysis?
I wouldn't mind running both DESeq2 and edgeR, and use only the DEG and pathways that are common in the results of both methods. However, how should I match the choice of parameters in both packages?

Do you have any advice on how should I proceed?

A ton of thanks in advance,

Txema

DEG GSEA single-cell pseudobulk • 432 views

ADD COMMENT • link 13 days ago by txema.heredia ▴ 210

score 5 · Answer 1 · 2025-04-29

I appreciate that these sorts of benchmarking papers exist to get an overview, but I will die on my hill saying that the specific combination of tools and these scores that are calculated there always depend on the dataset and situation, so I would not generally say something like "DESeq2+LTR+something" is best. I agree generally that also in my hands pseudobulking (if experimental design allows and sample size and depth is good) feels more robust that DE on pure single-cell level.

That having said, I personally for some time now almost exclusively use limma (voom or trend depending on situation) since it is the most generic one of common choices. I pair this with filtering by percent expression, meaning that I only test genes detected in >= x % of cells of at least one group in a pariwise comparion. Usually x is somewhere between 10-25%. That in my hands protects strongly from spurious calls stemming from outliers and reduces multiple testing burden a lot.

I would (if possible) always do paired tests, so e.g. within donor, within mouse etc, as this is most powerful. I typically use a test against a fold change (so not LRT-like tests) and I test against a fold change > 0 to avoid significant genes with small effect size. In limma/edgeR this is done via treat/glmTreat. Choice of minimum FC is arbitrary but something in the ballpart of 1.1-1.5 usually does the trick to emphasize what in my head are "meaningful" changes, avoiding fold change filtering which favour low-count genes (many threads where the limma/edgeR author recommends against fold change filtering).

Individual Qs:

Why is edgeR reporting so many DEG compared to DESeq2 in the B1vsB2 comparison? Is is more sensitive to small differences? Is it more prone to false positives?

Generally they perform at least similar, one would need to see code, data and prefiltering strategy to decide. After all, hard cutoffs can lead to overestimation of differences, you should plot plvalues and fold changes against each other to see how they correlate if you really want a direct comparison. I would check magnitude of fold changes and prioritize by fold change (as I discuss above) to avoid many DEGs hard to interpret.

Should we use all (22) pseudosamples in the data object, or only the 2+2 of the comparison of interest?

The usual "all together" vs "separate" discussion. For single-cell I usually run separate because we often assay tissues or heterogeneous populations so the mean-variance trend when using all samples is probably not too representative for each individual comparison.

Should we use a simple design ~cluster, or is it advisable to use ~cluster + sample for pseudobulk analyses trying to find differences between cell types? One of our samples always has more cells (of all cell types) than the other. Should we also include the number of cells per sample as covariates?

When paired analysis is possible I would do that, as it can eliminate donor variations etc and is more powerful, allowing to do more aggressive fold change testing as said above.

One of our samples always has more cells (of all cell types) than the other. Should we also include the number of cells per sample as covariates?

I am not sure how n of cells as covariate is behaving, but it's true that number of cells is important as fewer cells per pseudobulk gives lower counts, and this can not always be eliminated by the usual normalization methods. I usually run a subsampling function and compare DE numbers when using all cells or when subsampled to the smallest pseudobulk to see if absence if DEGs is due to low counts or "real".

Is it worth using LRT methods over Wald/Exact for simple pairwise comparisons? Is it more correct to use reduced=~1 or reduced=~sample in this type of analysis?

In my head testing against a fold change is more intuitive but this is probably because I have no formal stats education. Avoiding the garden of forked paths I never even explore LRT-like tests because after all, what is the strong and reproducible benchmark to decide for one over the other? So I simply stick to fold change testing which is the default for both DESeq2 (Wald) and edgeR (glmQLFTest as per their user guide).

I wouldn't mind running both DESeq2 and edgeR, and use only the DEG and pathways that are common in the results of both methods. However, how should I match the choice of parameters in both packages?

Ask yourself if inflating the burden of interpreting DEGs by running two tools is worth it. Do you plan to combine into a meta-analysis or just keep both lists? I would not do that. I assume you want to reveal interesting biology and not bother with benchmarking tools, no? Choose one and stick with it. Both edgeR and DESeq2 are well-established and there is not (to my knowledge) a reason to favour one over the other in general, as pseudobulk is basically bulk RNA-seq (with lower counts) and both tools work well with it, many benchmarks have shown it.