I have a large 10x dataset and I have generated a set of clusters (around 10). Initially, I ran FindAllMarkers() from the Seurat package and used the result as input for GSEA using C2 and C5 MSigDB libraries. However, the resulting set of enrichments are largely uninterpretable for many of the clusters.
Is this approach this the best practice for what I am trying to do? Can people suggest alternatives? I accept that I might need to refine my clustering strategy but also I have ran both Seurat clustering and cNMF on the data and received largely the same result.
What is your goal? Are you certain the clusters actually represent distinct cell types or states? How many genes are you getting out from marker finding? Do you have replicate samples?
The dataset is cells treated with a drug across 3 timepoints. Each timepoint has 3 replicates. Using AddModuleScore() for signatures of interest and plotting signature features on the UMAP shows different states across the dataset and across clusters.
The goal is to split the dataset by timepoint, identify clusters occuring individualy in each timepoint, then identify similar clusters using Jaccard similarity and map how the presense/absense/proportion of each expression programme changes over time.
Having recently done something similar, I found it easier to integrate samples prior to clustering/annotating or doing DE. But it really depends on what your data looks like. Getting clusters of similar cell states between timepoints made things much easier in my experience, as you can still go look for DE genes in each cluster between timepoints, etc.