Metabolomics Pathway Enrichment - Mummichog p value selection
0
1
Entering edit mode
2.7 years ago
ian.will ▴ 30

Hi all, I'm working through a LC-MS/MS dataset and am using the Mummichog pathway enrichment analysis through the MetaboAnalyst interface.

Mummichog enrichment method: In a nutshell, you input a list of simplified metabolomic features (single m/z's with retention times) and p-values to divide features between a significant set to look for enrichment of associated KEGG pathways and the background features to compare against. Mummichog operates in a way that's a little different than a more straightforward Fisher's Exact Test one might run on well-annotated gene expression data - mummichog takes mystery m/z's and has to make multiple individually noisy compound-level annotations per feature and rather compare things at a bulk associated pathway-level to give you over-representation metrics.

Mummichog output: Among the outputs one gets is list of:

  • identified pathways,
  • a Fisher's Exact Test (aka hypergeometric) p-value (presumably, raw and not multiple-testing corrected),
  • a Gamma p-value that is based on permutations that randomly resamples the list of total features for a number of features equal to the sig. set many times to create a Gamma null distribution.

Question/problem: I've seen folks use both either the FET p-value or Gamma p-value in the literature ... the Gamma certainly seems more correct. However, I've seen that it does not perform in a very meaningful way when the list of significant features is very large - in some cases sig. features are the majority of the list, and intuitively, the approach of resampling the total list for a null set that is the majority of the features ends up not working very well. On the other had, in smaller feature lists, I've noticed the Gamma can seem overly sensitive, and call out sig. enrichment signals when the "expected" number of features and "in sig. list" features are nearly the same number and the FET p-value is nearly 1.0. I could start doing things like only using Gamma p when sig. features < 50% of the input, and requiring 1.25-fold minimum enrichment, etc. But I have not seen folks grapple with this in the papers I've read. If something as straightforward as a Benj-Hoc multiple testing FDR was appropriate for the way mummichog worked, I'm sure the authors would have integrated that, so I don't think it's that easy.

Any thoughts about this problem generally - or, experience with Mummichog specifically?

Thank you!!

enrichment mummichog metabolomics • 1.1k views
ADD COMMENT

Login before adding your answer.

Traffic: 2557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6