Hi guys,
I am working with a bacterial genome (454) at the moment and would like to assign COG functional classification for all the 5000 or so genes.
I have used the 'rast' web server to annotate this genome. I have written to NCBI about the cognitor program but they tell me that it is no longer supported and that there is no way to do COG searches in batch mode.
It would be fantastic if you any of you could share your experiences on this.
Thanks !
So far as I know, there is no easy web-based tool for COG assignment. As Michael suggested, you could fetch PSSMs for the COG database from the FTP site and use rpsblast, or you could download the fasta format file myva from the COG FTP site and format it for search yourself.
My impression is that NCBI lacks either the resources or the inclination to support COGs: it barely features in their A-Z resources list and is not regularly updated. You may want to look at KEGG instead. Annotation of protein domains using e.g. HMMER or InterPro seems to be a more popular approach than functional assignment to the entire protein sequence, these days.
Well, KEGG provides an automatic annotation server. And the KO links to COG IDs, if you really want COG. However, my feeling is that KEGG provides better options for functional annotation than COG, which never really felt as though it was widely adopted or well-supported.
Agreed, NCBI is favoring their CD-database over the alternatives. But COG seems to have been updated on May 8th, so it should be fine. But how would you apply KEGG to this kind of problem?
I have been frantically searching on the web for tools that let us do this. found this paper describing 'Augur proteomics pipeline' that claims to doit but is currently offline. Thanks for all your input guys.
Sorry, can I ask some questions? I still don't know how to start my search. I have downloaded BLAST+ and cdd database, and read the user manual. But I just can't figure out where should I type those commend? after installed BLAST+, I just see a group of blast program.... It make me feel difficult to follow or understand. Yes, I don't know about program language, but I have to figure out how to use the blast function to classify my identified result. Because I don't have time and patient to use website search COG one by one... please, is there somebody can help me? please tell me how to start m
As far as I know, there are two possible ways to solve this:
Use an entirely automatic gene annotation pipeline. I know Augustus+ for eucaryotes, I'm sure someone can point you in the right direction for bacteria.
Do gene prediction and classification seperately. If I understand you correctly, you already have the predicted genes and just want to classify them automatically.
One possibility to do this would be rpsblast (with which I'm also currently working- if there are alternatives please let me know).
[...] that there is no way to do COG searches in batch mode
This is definitely not correct. For example, use rpsblast with the COG database:
download and install NCBI BLAST+
download the COG database as .smp files from NCBI (cdd.tar.gz here, see README for details)
create a COG-only rpsblast database (cf. this tutorial, ignore the BioPython part)
BLAST your predicted genes against your newly created database with the rpstblastn executable and interpret the PSSM matches (easiest way: highest COG match with e-value < e_max is a specific hit; note that frame)
Sorry, can I ask some questions? I still don't know how to start my search. I have downloaded BLAST+ and cdd database, and read the user manual. But I just can't figure out where should I type those commend? after installed BLAST+, I just see a group of blast program.... It make me feel difficult to follow or understand. Yes, I don't know about program language, but I have to figure out how to use the blast function to classify my identified result. Because I don't have time and patient to use website search COG one by one... please, is there somebody can help me? please tell me how to start my search... I had try to read guide on NCBI, but for a non-English speaking country student, there is to much words to read and make me feel impatient. sorry, I had to say : Compare with other database, NCBI is very~very~very not easy to understood...T^T.
There is a program that does automated COG assignment - look into MEGAN- it's a metagenomics software but basically you could just run a BLAST on all your stuff and input it. It automatically extracts the COGs and gives you a chart of them for your reads (or genes if you want to assemble them first).
Good Luck!
(P.S. You might have to clean a few up depending on how well annotated the hits are that you get back from BLAST, but it's definitely a lot quicker to do it this way).
sorry but I tried megan and found that it already needs a blast result and that it would just extract the cog hit from it (if present).
it definitely does not seem to be the program if we start with sequence data and want to assign cogs :(
I think you could use online CD-search tool and select search COGs to get COGs annotation of your genome