Question

Best practice for gene ontology/pathway/geneset enrichment in 10x data

0

Entering edit mode

7 months ago

yura.grabovska ▴ 760

I have a large 10x dataset and I have generated a set of clusters (around 10). Initially, I ran FindAllMarkers() from the Seurat package and used the result as input for GSEA using C2 and C5 MSigDB libraries. However, the resulting set of enrichments are largely uninterpretable for many of the clusters.

Is this approach this the best practice for what I am trying to do? Can people suggest alternatives? I accept that I might need to refine my clustering strategy but also I have ran both Seurat clustering and cNMF on the data and received largely the same result.

R Seurat scRNA • 833 views

ADD COMMENT • link updated 7 months ago by jared.andrews07 ★ 18k • written 7 months ago by yura.grabovska ▴ 760

0

Entering edit mode

What is your goal? Are you certain the clusters actually represent distinct cell types or states? How many genes are you getting out from marker finding? Do you have replicate samples?

ADD REPLY • link 7 months ago by jared.andrews07 ★ 18k

0

Entering edit mode

The dataset is cells treated with a drug across 3 timepoints. Each timepoint has 3 replicates. Using AddModuleScore() for signatures of interest and plotting signature features on the UMAP shows different states across the dataset and across clusters.

The goal is to split the dataset by timepoint, identify clusters occuring individualy in each timepoint, then identify similar clusters using Jaccard similarity and map how the presense/absense/proportion of each expression programme changes over time.

ADD REPLY • link 7 months ago by yura.grabovska ▴ 760

0

Entering edit mode

Having recently done something similar, I found it easier to integrate samples prior to clustering/annotating or doing DE. But it really depends on what your data looks like. Getting clusters of similar cell states between timepoints made things much easier in my experience, as you can still go look for DE genes in each cluster between timepoints, etc.

ADD REPLY • link 7 months ago by jared.andrews07 ★ 18k

score 0 · Answer 1 · 2024-09-19

0

Entering edit mode

7 months ago

bk11 ★ 3.1k

You may consider using pseudo-bulk analysis approach with gene set tests implemented in limma if your datasets have a complex experimental design. You can check the discussion in Single-cell best practices.

ADD COMMENT • link 7 months ago by bk11 ★ 3.1k

0

Entering edit mode

I'm currently running pseudo-bulk using the relevant section of the Seurat vignette to see what that gives me

ADD REPLY • link 7 months ago by yura.grabovska ▴ 760

0

Entering edit mode

Pseudobulk is definitely the way to go, FindAllMarkers is just not very robust in my opinion, and the bulk methods work well when counts are high enough.

ADD REPLY • link 7 months ago by jared.andrews07 ★ 18k