how to find metabolites
1
0
Entering edit mode
6.5 years ago
Learner ▴ 280

I have a list of genes and I want to know which metabolites are assigned to those genes. is there anyone who can tell me what are the options? I prefer to have access to a database rather than web app so feel free to comment on any package or script etc

genomics metabolomics • 2.4k views
ADD COMMENT
0
Entering edit mode

Check section 4.5.2 here: https://bioconductor.org/packages/release/bioc/vignettes/piano/inst/doc/piano-vignette.pdf and reference 6 therein. They describe making 'gene sets' where each gene set is a metabolite, and the genes within each set are the enzymes related to metabolism of that metabolite. I'm interested in this as well, let me know how it goes for you.

ADD REPLY
0
Entering edit mode

Apart from what appears to be a good answer by ivanarg2, you may also find the functions of MetaboAnalyst useful. Certain functions can work with genes.

Kevin

ADD REPLY
3
Entering edit mode
6.5 years ago
ivanarg2 ▴ 80

Based on this, with some modifications:

https://support.bioconductor.org/p/107927/

library(KEGGREST)
library(org.Hs.eg.db)
library(annotate)

## Get enzyme-gene annotations
res1 = keggLink("enzyme", "hsa")
tmpDF1 = data.frame(ec = res1, gene = names(res1))

## Get compound-enzyme annotations
res2 = keggLink("compound", "enzyme")
tmpDF2 = data.frame(cpd = res2, ec = names(res2))

## Merge
df = merge(tmpDF1, tmpDF2, by="ec")
## aggDF = aggregate(df['ec'], by = df['gene'], FUN=paste) ## Optional, see original link

## Convert KEGG gene IDs to Entrez Gene IDs
convs = keggConv("hsa", "ncbi-geneid")
names(convs) = as.character(gsub("ncbi-geneid:", "", names(convs)))
df$ncbi_id = names(convs)[match(df$gene, as.character(convs))]
df$ncbi_name = getSYMBOL(df$ncbi_id, "org.Hs.eg.db")

## Convert compound IDs to compound (metabolite) names
mets = keggList("compound")
mets = setNames(str_split(mets, ";", simplify = T)[,1], names(mets))
df$cpd_name = as.character(mets)[match(df$cpd, names(mets))]

## Make a list of metabolite gene sets. Each metabolite gene set is composed of enzymes/genes involved in their metabolism
metabolites.gs = lapply(1:length(unique(df$cpd_name)), function(x) df$ncbi_name[which(df$cpd_name == unique(df$cpd_name)[x])])
namesmetabolites.gs) = unique(df$cpd_name)
metabolites.gs$'D-Glyceraldehyde 3-phosphate' ## test

## Taking your df from your original post, create ranks.
ranks = setNames(df$logFC, df$X)

## Now you have both the gene sets and the ranks. You can run GSEA
res = fgseametabolites.gs, ranks, nperm = 1000)

Note: depending on the output from your differential expression analysis, I would use (i) the t-statistic, which is simply logFC / sd(logFC); (ii) the logFC, as I show above; or (iii) separate your df into 'up' or 'down' regulated, and use the p-values. I'm not an expert on this so I'm not sure which option is best, but you can play around with different analyses.

EDIT: For some reason my parenthesis are not showing: the proper code is "names(metabolites.gs)" and "fgsea(metabolites.gs)", without the asterisks.

ADD COMMENT
0
Entering edit mode

@ivanarg2 thanks for your help. how then I can use my gene list to find the metabolites?

ADD REPLY
0
Entering edit mode

You can then run regular gene set enrichment analysis, as below:

fgseametabolites.gs, ranks, nperm = 1000) ## ranks is from your differential expression analysis, i.e., a t-statistic (logFC/logFC.sd)

The resulting list of 'pathways' (really, metabolites), will tell you the enrichment of metabolites in different conditions. Is this what you were asking?

ADD REPLY
0
Entering edit mode

@ivanarg2 here is what I undestood. You extract the gene and metabolits assigned to them from KEGG. Then I have a list of genes which are upregulated and I already did the differential experission , so I have their names. I want to just see what metabolites are upregulated with the list of the genes that I have . Is that clear?

ADD REPLY
0
Entering edit mode

Right. So you already have the two necessary parameters to run GSEA: (i) a list of gene sets (which usually refers to a biological pathway and the genes associated with those pathways, but in this case the gene sets refer to metabolites, and the genes involved in their metabolism); and (ii) a ranking of genes based on their differential expression. Just run fgsea on your list of differentially expressed genes and metabolites.gs.

ADD REPLY
0
Entering edit mode

@ivanarg2 Can you help me to apply it on this list ? "GPX3", "GLRX", "LBP", "CRYAB", "DEFB1", "HCLS1", "SOD2", "HSPA2", "ORM1", "IGFBP1", "PTHLH", "GPC3", "IGFBP3","TOB1", "MITF", "NDRG1", "NR1H4", "FGFR3", "PVR", "IL6", "PTPRM", "ERBB2", "NID2", "LAMB1", "COMP", "PLS3", "MCAM", "SPP1", "LAMC1", "COL4A2", "COL4A1", "MYOC", "ANXA4", "TFPI2", "CST6", "SLPI", "TIMP2", "CPM", "GGT1", "NNMT", "MAL", "EEF1A2", "HGD", "TCN2", "CDA", "PCCA", "CRYM", "PDXK","STC1", "WARS", "HMOX1", "FXYD2", "RBP4", "SLC6A12", "KDELR3", "ITM2B"

ADD REPLY
0
Entering edit mode

To run GSEA you need a vector with (i) gene names, as you provide, and (ii) some kind of ranking variable, like a fold change, a p value, or something like that. Do you have this info from your differential expression analysis?

ADD REPLY
0
Entering edit mode

@ivanarg2 Please look at my question. I provided the data there in a R format

ADD REPLY
0
Entering edit mode

I edited my answer based on your edit.

ADD REPLY
0
Entering edit mode

@ivanarg2 I did not check it yet but I accepted it. however, you have few df in your code which maybe should be named differently otherwise it will mess up the analysis!

ADD REPLY
0
Entering edit mode

@ivanarg2 do you know anyway to plot the metabolic pathways found by your strategy?

ADD REPLY
0
Entering edit mode

The fgsea package has a few good functions for plotting the results of fgsea analysis

ADD REPLY

Login before adding your answer.

Traffic: 2055 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6