Entering edit mode
8 days ago
bio_info
▴
20
Hi all,
I am trying to run an fgsea analysis using the fgsea
package in R for all cluster biomarkers in my scRNA-seq dataset. I have written a short script to do this. However, all the resulting pathways have a very high adjusted-p-val, around 0.9. I have tried playing with the number of top genes in my ranked vector, from 200 till 500 and also playing with the minSize
parameter but I still get very high p-values. Can anyone please help me in figuring out where the mistake is?
library(fgsea)
markers <- FindAllMarkers(sobj, verbose = T, only.pos = T)
# Remove mito and ribo genes, filter by adj_p_val
markers_filtered <- markers %>%
filter(!str_detect(gene, "^MT-") & !str_detect(gene, "^RPS") & !str_detect(gene, "^RPL")) %>%
filter(p_val_adj < 0.05) %>%
arrange(desc(avg_log2FC))
# Prepare ranked vector
markers_ranked_vector <- list()
for (i in unique(markers_filtered$cluster)) {
cluster_data <- markers_filtered %>%
filter(cluster == i)
top_genes <- cluster_data %>%
arrange(desc(avg_log2FC)) %>%
slice_head(n = 300)
# Create a named vector
markers_ranked_vector[[as.character(i)]] <- setNames(top_genes$avg_log2FC, top_genes$gene)
}
# Download msigdb hallmark genesets
hallmark <- msigdbr::msigdbr(species = "Homo sapiens", category = "H")
msigdbr_list = split(x = hallmark$gene_symbol, f = hallmark$gs_name)
fgsea_results <- lapply(markers_ranked_vector, function(x) {
fgsea(
pathways = msigdbr_list,
stats = x,
scoreType = "pos",
minSize = 10
)
})
Thank you for your time!
You don't need to prefilter GSEA before running it, GSEA is designed to take in the full gene-list ranked by a statistic, not top X genes.
I see, so I don't need to select the top 300 genes? I can pass the entire result of my DEG to fgsea? This includes around 100 genes per cluster...
I am still getting very high p-values of around 0.9, what can I do in this situation? Should I redo my clustering or use different parameters for the
fgsea
?Have you tried using pseudo-bulk analysis for DE and then GSEA? That's generally the more recommended option as it is more stable than the default Seurat FindMarkers()
Otherwise:
Also consider that Hallmarks isn't always the best MSigDB to use especially when the difference in your data is more sublitle. You should try C2 and C5 at as a minimum also