Gene list pathway classification and visualization
1
1
Entering edit mode
4.1 years ago

Hi, I have a list of genes from a Pseudomonas strain with official gene identifiers and I have been trying to find a resource in order to sort this genes according to their metabolic pathways. Also if possible I'd like to also represent these results in some sort of graph like a circular one or a barplot but this is secondary. This strain is not supported by databases like KEGG but it is included in the NCBI. I have tried R packages like clusterProfiler and web servers like DAVID 6.8 and I have not been able to make it work out so far.

Any tips/advice would be greatly appreciated. Cheers PS: I have recently started learning R and UNIX so I'm not very proficient.

genome gene pathway • 1.3k views
ADD COMMENT
0
Entering edit mode

Fun4me, might help: https://sourceforge.net/projects/fun4me/ Let me know if you have any questions.

This package includes a few programs for rapid functional annotation for metagenomic sequences, including, 1) Gene prediction by FragGeneScan; 2) Similarity search by RAPSearch2; 3) Functional annotation in GO (Gene Ontology) and EC (Enzyme Commission) based on similarity search results; 4) From EC to metabolic pathway reconstruction by MinPath.

Inputs: Just sequencing reads (or assemblies) Outputs: Protein-coding genes (or gene fragments); similarity search; functional annotations (in GO and EC); metabolic pathways.

ADD REPLY
4
Entering edit mode
4.1 years ago
xanderpico ▴ 580

There are indeed a lot of tools and resources to choose from. Three important considerations are (1) methods, (2) update frequency, and (3) coverage.

  1. METHODS. The methods for pathways classification you are asking about are generally refered to as functional enrichment analyses and include over-representation analysis (ORA), gene set enrichment analysis (GSEA), and topological analysis. Each resource implements one or more of these methods usually with a bit of customized pre or post filtering. ClusterProfiler, for example, supports both ORA and GSEA. There are review papers that compare these methods, but as there aren't really any gold standard benchmarks, many folks just try more than one method. In any case, it's important to know which method you are using with a give tool and its caveats.

  2. UPDATES. In terms of sources of pathway annotations: KEGG has not been significantly updated since around 2011 (https://www.kegg.jp/kegg/docs/upd_map.html), BioCyc is updated around 3 times a year (https://biocyc.org/release-notes.shtml), and WikiPathways is significantly updated every month (http://releases.wikipathways.org)! So, this is something to definitely take into account when choosing a resource to plug into a given method. In terms of tools: DAVID hasn't been updated since 2016 (https://david.ncifcrf.gov/gene2gene.jsp), which means their GO and pathway annotations will all be outdated. ClusterProfiler is under active development (https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) and let's you supply your own GMT file from WikiPathways or wherever to perform ORA or GSEA.

  3. COVERAGE. If you're studying human or mouse models, then pretty much any resource will do. But you are working with a Pseudomonas strain. This will dramatically limit your options. In general, I would suggest BioCyc for baterial species coverage. It looks like they currently support two different strains for Pseudomonas. You might consider translating your genes to E.coli identifiers to perhaps expand your options to other resources.

ADD COMMENT
0
Entering edit mode

Thank you very much :). I will try your recommendations and see what I can get out of this list.

ADD REPLY

Login before adding your answer.

Traffic: 3103 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6