Entering edit mode
8 days ago
Hi all,
I am trying to run an fgsea analysis using the fgsea
package in R for all cluster biomarkers in my scRNA-seq dataset. I have written a short script to do this. However, all the resulting pathways have a very high adjusted-p-val, around 0.9. I have tried playing with the number of top genes in my ranked vector, from 200 till 500 and also playing with the minSize
parameter but I still get very high p-values. Can anyone please help me in figuring out where the mistake is?
markers <- FindAllMarkers(sobj, verbose = T, only.pos = T)
# Remove mito and ribo genes, filter by adj_p_val
markers_filtered <- markers %>%
filter(!str_detect(gene, "^MT-") & !str_detect(gene, "^RPS") & !str_detect(gene, "^RPL")) %>%
filter(p_val_adj < 0.05) %>%
# Prepare ranked vector
markers_ranked_vector <- list()
for (i in unique(markers_filtered$cluster)) {
cluster_data <- markers_filtered %>%
filter(cluster == i)
top_genes <- cluster_data %>%
arrange(desc(avg_log2FC)) %>%
slice_head(n = 300)
# Create a named vector
markers_ranked_vector[[as.character(i)]] <- setNames(top_genes$avg_log2FC, top_genes$gene)
# Download msigdb hallmark genesets
hallmark <- msigdbr::msigdbr(species = "Homo sapiens", category = "H")
msigdbr_list = split(x = hallmark$gene_symbol, f = hallmark$gs_name)
fgsea_results <- lapply(markers_ranked_vector, function(x) {
pathways = msigdbr_list,
stats = x,
scoreType = "pos",
minSize = 10
Thank you for your time!
You don't need to prefilter GSEA before running it, GSEA is designed to take in the full gene-list ranked by a statistic, not top X genes.
I see, so I don't need to select the top 300 genes? I can pass the entire result of my DEG to fgsea? This includes around 100 genes per cluster...
I am still getting very high p-values of around 0.9, what can I do in this situation? Should I redo my clustering or use different parameters for the
?Have you tried using pseudo-bulk analysis for DE and then GSEA? That's generally the more recommended option as it is more stable than the default Seurat FindMarkers()
Also consider that Hallmarks isn't always the best MSigDB to use especially when the difference in your data is more sublitle. You should try C2 and C5 at as a minimum also