Need help with fgsea
0
0
Entering edit mode
8 days ago
bio_info ▴ 20

Hi all,

I am trying to run an fgsea analysis using the fgsea package in R for all cluster biomarkers in my scRNA-seq dataset. I have written a short script to do this. However, all the resulting pathways have a very high adjusted-p-val, around 0.9. I have tried playing with the number of top genes in my ranked vector, from 200 till 500 and also playing with the minSize parameter but I still get very high p-values. Can anyone please help me in figuring out where the mistake is?

library(fgsea)

markers <- FindAllMarkers(sobj, verbose = T, only.pos = T)

# Remove mito and ribo genes, filter by adj_p_val
markers_filtered <- markers %>%
filter(!str_detect(gene, "^MT-") & !str_detect(gene, "^RPS") & !str_detect(gene, "^RPL")) %>% 
filter(p_val_adj < 0.05) %>% 
arrange(desc(avg_log2FC))

# Prepare ranked vector
markers_ranked_vector <- list()

for (i in unique(markers_filtered$cluster)) {
cluster_data <- markers_filtered %>% 
filter(cluster == i)

top_genes <- cluster_data %>% 
arrange(desc(avg_log2FC)) %>% 
slice_head(n = 300)

# Create a named vector 
markers_ranked_vector[[as.character(i)]] <- setNames(top_genes$avg_log2FC, top_genes$gene)
}

# Download msigdb hallmark genesets
hallmark <- msigdbr::msigdbr(species = "Homo sapiens", category = "H")

msigdbr_list = split(x = hallmark$gene_symbol, f = hallmark$gs_name)

fgsea_results <- lapply(markers_ranked_vector, function(x) {
  fgsea(
    pathways = msigdbr_list,  
    stats = x,
    scoreType = "pos",
    minSize = 10
  )
})

Thank you for your time!

scRNAseq fgsea • 219 views
ADD COMMENT
0
Entering edit mode

You don't need to prefilter GSEA before running it, GSEA is designed to take in the full gene-list ranked by a statistic, not top X genes.

ADD REPLY
0
Entering edit mode

I see, so I don't need to select the top 300 genes? I can pass the entire result of my DEG to fgsea? This includes around 100 genes per cluster...

ADD REPLY
0
Entering edit mode

I am still getting very high p-values of around 0.9, what can I do in this situation? Should I redo my clustering or use different parameters for the fgsea?

ADD REPLY
0
Entering edit mode

Have you tried using pseudo-bulk analysis for DE and then GSEA? That's generally the more recommended option as it is more stable than the default Seurat FindMarkers()

Otherwise:

markers <- FindAllMarkers(sobj, verbose = T, only.pos = FALSE) ## include negative genes - GSEA needs +ve and -ve ranked enrichment to work properly

Also consider that Hallmarks isn't always the best MSigDB to use especially when the difference in your data is more sublitle. You should try C2 and C5 at as a minimum also

ADD REPLY

Login before adding your answer.

Traffic: 2080 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6