Question

Could I apply limma to metabolon concentrations?

0

Entering edit mode

5.8 years ago

shangyuan5000 ▴ 30

Hi,

I have a metabolon dataset with multiple factors(Gender, Genotypes, Treatment) and even the interactions between different factors. I tried to apply MetaboAnalyst and found that it's hard to model my question in MetaboAnalyst which does not support GLM modeling.

I used to use DESeq2/Limma to analyze my RNAseq data with similar experiment designs, and it was very successful. My question is could I apply those two packages in my metabolon analysis? Is there anything I need to pay special attention before going to those steps?

If I wanted to apply Camera/limma, could I use it the way similar to that in RNA-Seq?

Thanks & Best regards,
Raymond

rna-seq • 3.5k views

ADD COMMENT • link updated 5.8 years ago by Kevin Blighe 88k • written 5.8 years ago by shangyuan5000 ▴ 30

score 2 · Answer 1 · 2019-01-30

2

Entering edit mode

5.8 years ago

Kevin Blighe 88k

The distribution of the metabolomics data that you received may not be on the scale/distribution expected by limma. Limma fits a linear regression model independently for each variable. On which distribution is your metabolomics data? For limma, the distribution should follow the normal distribution (Edit: for linear regression, the assumption is actually that your residuals are normally-distributed):

Here is the QC that I applied to my metabolomics study data (from Metabolon):

**Metabolomics quality control**

 1. Start with instrument-produced relative abundance metabolite levels

 2. Remove metabolites if:
 - Level in QC samples has coefficient of variation (CoV) > 25%
 - Missingness > 10% across test samples
 - No variability across test samples based on interquartile range (IQR)

 3. Remove samples with metabolite missingness > 10%

 4. Filter out unidentified/unknown metabolites and those classified as
    xenobiotic chemicals

 5. Convert NA values to 0

After that, the data was log-transformed and then converted to Z scale. With the Z-scaled data, I performed bootstrapped unbiased clustering (or 'machine learning', if I followed trends).

Kevin

ADD COMMENT • link 5.2 years ago by Kevin Blighe 88k

0

Entering edit mode

That's awesome, Kevin! Thanks for your kind suggestion. I'm using a targeted metabolite database(Only ~800 metabolites), so I only remove those metabolites with missingness > 50%. I also do the "Sample-wise normalization, LogTransformation, and Autoscaling (Z-scale)", and check the overall data distribution looks "Normal". I followed the tutorials in MetaboAnalyst, which only support T-test(One factor, 2 levels) or ANOVA(One factor, >3 levels, at least 3 replicates/level).
I tried "MSEA (Molecular Set Enrichment Analysis", it seems that you could not define your own "metabolite set". Do you have any idea about how to do this in the metabolomics field?

Best regards, Raymond

ADD REPLY • link 5.8 years ago by shangyuan5000 ▴ 30

0

Entering edit mode

Cool! With your data (on the Z scale) you can perform your own tests in R or STATA. For example, if you have an outcome variable, like Case-Control, then you can perform a binary logistic regression:

summary(glm(CaseControl ~ metabolyte1, family = binomial(link = 'logit')))
summary(glm(CaseControl ~ metabolyte2, family = binomial(link = 'logit')))
*et cetera*

Have you used regression models in the past?

You can also just use the Student's t-test (t.test()), ANOVA (aov()), et cetera Some tutorials for ANOVA, here:

In MetaboAnalyst, you can also use the KEGG pathway analysis tool, no? - https://www.metaboanalyst.ca/faces/ModuleView.xhtml

What is the ultimate aim of your project?

ADD REPLY • link 5.8 years ago by Kevin Blighe 88k

0

Entering edit mode

Thanks, Kevin. I used the regression models in my homework before:_0. I can use the KEGG pathway, but less than 50% of my metabolites could map a KEGG id, the mapping rate is too low. The ultimate aim is to study the potential effects of between two drugs, and we want to test whether there are any indications from metabolites. We had a small sample size, unbalanced experimental design (and potential confounding factors effects because of this unbalanced design), that's why I wanted to use a regression model, trying to separate the confounding factor.

ADD REPLY • link 5.8 years ago by shangyuan5000 ▴ 30

0

Entering edit mode

I see. So, the models would be:

glm(drug ~ metabolite1)
glm(drug ~ metabolite2)
... ...
...

When you identify key metabolites with p<0.05, you can then create a final model and derive AUC (from ROC analysis).

final <- glm(drug ~ metabolite1 + metabolite5 + metabolite16)

You should also perform cross validation on the final model with cv.glm() (from boot package)

If you need help, I have a R package that can run the models: https://github.com/kevinblighe/RegParallel

ADD REPLY • link 5.8 years ago by Kevin Blighe 88k

1

Entering edit mode

Cool R Packages! Kevin! I did not make myself clear, but I think i get your point: final <- glm(metabolite ~ drug1+ drug2 + drug1:drug2). Thanks for the fruitful discussions.

ADD REPLY • link 5.8 years ago by shangyuan5000 ▴ 30