Hi,
I've downloaded more 10K of bacterial genomes (reference or representative) coding genes from refseq. example
I wish to have basic gene annotation like kegg pathways and cog functional categories (or any other option that assign genes to basic functional categories/pathways). I can of course run annotation pipeline like eggnog but it would takes a long time to finish and I guess there should be a database with annotation of these coding genes.
Is the a ready-to-wear functional annotation of prokaryotic reference genomes?
Thanks
1 - your link does not exist.
2 - What and how exactly did you download from RefSeq?
3 - You know that...
... is RefSeq, right?
I've updated the link. I've downloaded the *_cds_from_genomic.fna.gz file using https://github.com/kblin/ncbi-genome-download . Yes I know, but I need functional annotation of the coding genes which is missing in refseq.
To my knowledge, the Integrated Microbial Genome database (link) is the only repository with functional annotation (KEGG, MetaCyc, COG, Pfam): example. The only problems are: 1) your search is limited to 500 genomes (if you are a registered user); 2) not every genome in RefSeq can be found in the IMG database.
Is there anything you can do to reduce the number of genomes and speed-up the annotation process? Do you really need 10k genomes? Finally, as alternative to eggNOG you should consider KofamScan (link); it is much faster than eggNOG
Your "functional annotations of the coding genes" are in RefSeq full reports, not in the sequences...
Can you refer me to that full report (even of specific file and I'll take it from there). I failed to find the data I'm looking for on all refseq files on FTP and through the record on ncbi, and this was my motivation to ask here. Thanks