Hello Biostars!
I have been analyzing some single cell RNA sequencing data, which compares neural transcriptomes from mice treated with either vehicle, or a drug. All the basic stuff is going fine.
When I get my clusters, I find not many super interesting significant drug induced DEGs. If I do GSEA on each cluster, however, things look very interesting, and align with published literature very well. All good so far.
So my problem is this - I use fgsea in R to do my GSEAs, but I am doing the GSEAs on 18 different clusters. That means that my adjusted p values that I get from fgsea are not valid - they need to be corrected to reflect the 18 repeated tests.
If I were to Bonferroni correct those p values, would it be acceptable to take the fgsea output adjusted p values that were already corrected for the number of gene sets tested, and then correct them again? Or is it more appropriate to take the raw p value from fgsea and somehow Bonferroni it for both the number of gene sets tested and also the number of clusters in which I am performing the tests?
Basically, can you Bonferroni an adjp that has already been Bonferronied?
Thanks very much in advance for your insights! Ed
To clarify: fgsea adjusted P-values are Benjamini-Hochberg-adgjusted, not Bonferoni.
Hi jevanveen,
I was thinking to run GSEA on my scRNA-seq data, but I'm still debating what would be the input from evert single cluster? How do you project the complexity of the cluster? Are you using the average gene expression from each cluster? Are you sampling few cells from each cluster?
I'd really appreciate your thoughts on that. Thanks!