Hi,
I have a metabolon dataset with multiple factors(Gender, Genotypes, Treatment) and even the interactions between different factors. I tried to apply MetaboAnalyst and found that it's hard to model my question in MetaboAnalyst which does not support GLM modeling.
I used to use DESeq2/Limma to analyze my RNAseq data with similar experiment designs, and it was very successful. My question is could I apply those two packages in my metabolon analysis? Is there anything I need to pay special attention before going to those steps?
If I wanted to apply Camera/limma, could I use it the way similar to that in RNA-Seq?
Thanks & Best regards,
Raymond
That's awesome, Kevin! Thanks for your kind suggestion. I'm using a targeted metabolite database(Only ~800 metabolites), so I only remove those metabolites with missingness > 50%. I also do the "Sample-wise normalization, LogTransformation, and Autoscaling (Z-scale)", and check the overall data distribution looks "Normal". I followed the tutorials in MetaboAnalyst, which only support T-test(One factor, 2 levels) or ANOVA(One factor, >3 levels, at least 3 replicates/level).
I tried "MSEA (Molecular Set Enrichment Analysis", it seems that you could not define your own "metabolite set". Do you have any idea about how to do this in the metabolomics field?
Best regards, Raymond
Cool! With your data (on the Z scale) you can perform your own tests in R or STATA. For example, if you have an outcome variable, like
Case
-Control
, then you can perform a binary logistic regression:Have you used regression models in the past?
You can also just use the Student's t-test (
t.test()
), ANOVA (aov()
), et cetera Some tutorials for ANOVA, here:In MetaboAnalyst, you can also use the KEGG pathway analysis tool, no? - https://www.metaboanalyst.ca/faces/ModuleView.xhtml
What is the ultimate aim of your project?
Thanks, Kevin. I used the regression models in my homework before:_0. I can use the KEGG pathway, but less than 50% of my metabolites could map a KEGG id, the mapping rate is too low. The ultimate aim is to study the potential effects of between two drugs, and we want to test whether there are any indications from metabolites. We had a small sample size, unbalanced experimental design (and potential confounding factors effects because of this unbalanced design), that's why I wanted to use a regression model, trying to separate the confounding factor.
I see. So, the models would be:
When you identify key metabolites with p<0.05, you can then create a final model and derive AUC (from ROC analysis).
You should also perform cross validation on the final model with
cv.glm()
(from boot package)If you need help, I have a R package that can run the models: https://github.com/kevinblighe/RegParallel
Cool R Packages! Kevin! I did not make myself clear, but I think i get your point: final <- glm(metabolite ~ drug1+ drug2 + drug1:drug2). Thanks for the fruitful discussions.