Genome Wide Cog Assignment
4
10
Entering edit mode
14.5 years ago
Neo ▴ 200

Hi guys, I am working with a bacterial genome (454) at the moment and would like to assign COG functional classification for all the 5000 or so genes. I have used the 'rast' web server to annotate this genome. I have written to NCBI about the cognitor program but they tell me that it is no longer supported and that there is no way to do COG searches in batch mode. It would be fantastic if you any of you could share your experiences on this. Thanks !

genome annotation • 21k views
ADD COMMENT
0
Entering edit mode

I think you could use online CD-search tool and select search COGs to get COGs annotation of your genome

ADD REPLY
11
Entering edit mode
14.5 years ago
Neilfws 49k

So far as I know, there is no easy web-based tool for COG assignment. As Michael suggested, you could fetch PSSMs for the COG database from the FTP site and use rpsblast, or you could download the fasta format file myva from the COG FTP site and format it for search yourself.

My impression is that NCBI lacks either the resources or the inclination to support COGs: it barely features in their A-Z resources list and is not regularly updated. You may want to look at KEGG instead. Annotation of protein domains using e.g. HMMER or InterPro seems to be a more popular approach than functional assignment to the entire protein sequence, these days.

ADD COMMENT
2
Entering edit mode

Well, KEGG provides an automatic annotation server. And the KO links to COG IDs, if you really want COG. However, my feeling is that KEGG provides better options for functional annotation than COG, which never really felt as though it was widely adopted or well-supported.

ADD REPLY
0
Entering edit mode

Agreed, NCBI is favoring their CD-database over the alternatives. But COG seems to have been updated on May 8th, so it should be fine. But how would you apply KEGG to this kind of problem?

ADD REPLY
0
Entering edit mode

I have been frantically searching on the web for tools that let us do this. found this paper describing 'Augur proteomics pipeline' that claims to doit but is currently offline. Thanks for all your input guys.

ADD REPLY
0
Entering edit mode

Excellent suggestions Neil. Your points about COG is exactly right.

ADD REPLY
0
Entering edit mode

Sorry, can I ask some questions? I still don't know how to start my search. I have downloaded BLAST+ and cdd database, and read the user manual. But I just can't figure out where should I type those commend? after installed BLAST+, I just see a group of blast program.... It make me feel difficult to follow or understand. Yes, I don't know about program language, but I have to figure out how to use the blast function to classify my identified result. Because I don't have time and patient to use website search COG one by one... please, is there somebody can help me? please tell me how to start m

ADD REPLY
0
Entering edit mode

seems to be a web-based tool for assignment of COGs would be very helpful. Anyone know of anyone working on that? :D

ADD REPLY
9
Entering edit mode
14.5 years ago

As far as I know, there are two possible ways to solve this:

  1. Use an entirely automatic gene annotation pipeline. I know Augustus+ for eucaryotes, I'm sure someone can point you in the right direction for bacteria.

  2. Do gene prediction and classification seperately. If I understand you correctly, you already have the predicted genes and just want to classify them automatically.

One possibility to do this would be rpsblast (with which I'm also currently working- if there are alternatives please let me know).

[...] that there is no way to do COG searches in batch mode

This is definitely not correct. For example, use rpsblast with the COG database:

  • download and install NCBI BLAST+
  • download the COG database as .smp files from NCBI (cdd.tar.gz here, see README for details)
  • create a COG-only rpsblast database (cf. this tutorial, ignore the BioPython part)
  • BLAST your predicted genes against your newly created database with the rpstblastn executable and interpret the PSSM matches (easiest way: highest COG match with e-value < e_max is a specific hit; note that frame)
ADD COMMENT
0
Entering edit mode
14.2 years ago
Carl • 0

Sorry, can I ask some questions? I still don't know how to start my search. I have downloaded BLAST+ and cdd database, and read the user manual. But I just can't figure out where should I type those commend? after installed BLAST+, I just see a group of blast program.... It make me feel difficult to follow or understand. Yes, I don't know about program language, but I have to figure out how to use the blast function to classify my identified result. Because I don't have time and patient to use website search COG one by one... please, is there somebody can help me? please tell me how to start my search... I had try to read guide on NCBI, but for a non-English speaking country student, there is to much words to read and make me feel impatient. sorry, I had to say : Compare with other database, NCBI is very~very~very not easy to understood...T^T.

ADD COMMENT
2
Entering edit mode

Please, ask this new question in another thread https://www.biostars.org/p/new/post/

ADD REPLY
0
Entering edit mode

All of the problem is the BLAST+ is the new version! they change the command we should use.

ADD REPLY
0
Entering edit mode
14.0 years ago
Nep • 0

There is a program that does automated COG assignment - look into MEGAN- it's a metagenomics software but basically you could just run a BLAST on all your stuff and input it. It automatically extracts the COGs and gives you a chart of them for your reads (or genes if you want to assemble them first). Good Luck! (P.S. You might have to clean a few up depending on how well annotated the hits are that you get back from BLAST, but it's definitely a lot quicker to do it this way).

ADD COMMENT
0
Entering edit mode

sorry but I tried megan and found that it already needs a blast result and that it would just extract the cog hit from it (if present). it definitely does not seem to be the program if we start with sequence data and want to assign cogs :(

ADD REPLY

Login before adding your answer.

Traffic: 1922 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6