For the second question there are debates since at least 10 years on how to handle the universe/background problem. Some people say one should include all genes possibly sequenced by your platform, or all genes that were even reported by your machine in your particular experiment. Others say that one should limit the background to at least the genes specific to your cell type or some other very basic, very fundamental feature of your experimental material, like if you sequenced cells with high copy number variation or only immune cells.
In my case, for example, Seurat returns a set of 2000 highly variable genes (from an initial set of 23k genes), for which it then tests differential expression. According to the proponents of background-trimming, I should use these 2000 genes as the background to run my enrichment test against, but my hunch tells me that this severs the link of our data to the factual physical nature of real genomes, which contain much more than 2000 genes.
We should also not forget about how the genesets that we check enrichment for were generated. For example, the study that contributed the FLORIO_NEOCORTEX geneset to the MSigDB, had the following background when deriving their insight:
Expressed genes were defined using a cutoff of FPKM >1. Differentially expressed genes were defined using a cutoff of p<0.01.
So from the get-go the authors of this study will most likely have a different background set of genes from yours, even though you compare your results to their geneset. Furthermore, the authors do a severe filtering of their information in order to obtain their geneset, illustrated by their own figure:
Due to varying conditions upon which the reference genesets (from MSigDB, for example) were generated, I think that we should treat them all as generated from the full genome of the organisms of interest (unless the authors of the pathway/geneset explicitly state that the dataset that they used for generating their pathway/geneset, was limited to X amount of genes (e.g. 2000 genes) and list them).
I guess it's best summarized here:
Basically, it is about how wrong is still tolerable to you. This then
ties into the second problem: there seems to be little to no penalty
for doing an incorrect enrichment analysis. The person sticking to
accuracy hamstrings themselves. So the "beware before you publish" is
more of wishful thinking of how it should be but isn't.