Hi,
probably someone has already raised this question. How large the size of a gene set to be used as an input for fgsea or another gsea analysis?
I use GO biological process from Msigdb collection and it seems that it has larger gene sets (7529 gene sets, I believe).
when I run the fgsea analysis and set them into minSize 15, maxSize 500, and nperm 1000, I did not find any significant pathways (adjusted p value < 0.05). However, once I increased the permutation to 10000, I saw that there were 277 significant pathways.
I am just wondering whether it was a good idea to increase the permutation as a way of compensating larger dataset such as GO biological process? or should I just set a more stringent minSize and maxSize parameter?
Kindly provide any information about this.
Thanks in advance.
Best,
Ayu
Is there actually an argument against using larger sets than 500 genes with fgsea? Or is this some relic from the original GSEA implementation? I can think about situations where one compares completely different cell types so has a lot of DEGs that are all shaping the identity of one or the other celltype.
There are couple of related points why limiting the gene set size can be a good idea:
So, overall, on GO collection setting max size to 500 seems to be a good idea. But for some collections, such as transcription targets or ChIP-seq genes, it can be advantageous to not set any limit at all.