I have to interpret the altered metatabolic pathways of an RNA-seq experiment from a non-model bacteria. We are working with a draft genome that has 47 contigs. I realised that because I even don't know the carbon metabolism and its metabolic capabilities, makes no sense try to interpret a list of enriched GO terms if I even don't know if the glycolysis is completely functional.
I could give some tables of enriched GO terms and try to publish my results but is difficult to believe them without knowing the metabolism of this bacteria.
KEGG, MetaCyc, COGs and Pfams were retrieved using IMG/ER database, so they were automatically assigned by their pipeline. Doing some analysis, what I have observed is that I have very few number of hits to other databases such as KEGG or MetaCyc, 1/3 of the proteome or less. However my custom GO database has about 2/3 of the genome with GO terms and this makes a difference when performing enrichments. Examples are ABC transporters or transposases. They are catched by GO terms but not by KEGG or MetaCyc.
I would like to know what is the best tool for my purpose. I need to get the metabolic capabilities of this bacteria and describe the altered metabolic pathways.
Thanks,
Bernardo
I will start submitting the draft genome to BioCyc via Pathway Tools software. BioCyc semi-automates the enrichment interpretation because you can easily see the genes in the pathway are present in you genome or not.
When you submit to KEGG, do you submit to Kegg KAAS? Do you submit the complete 47 contigs or genes predicted from these contigs?