Basic annotation of refseq bacterial genomes
1
0
Entering edit mode
3.5 years ago
biobiu ▴ 150

Hi,

I've downloaded more 10K of bacterial genomes (reference or representative) coding genes from refseq. example

I wish to have basic gene annotation like kegg pathways and cog functional categories (or any other option that assign genes to basic functional categories/pathways). I can of course run annotation pipeline like eggnog but it would takes a long time to finish and I guess there should be a database with annotation of these coding genes.

Is the a ready-to-wear functional annotation of prokaryotic reference genomes?

Thanks

annotation bacteria • 1.6k views
ADD COMMENT
0
Entering edit mode

1 - your link does not exist.

2 - What and how exactly did you download from RefSeq?

3 - You know that...

a database with annotation of these coding genes

... is RefSeq, right?

ADD REPLY
0
Entering edit mode

I've updated the link. I've downloaded the *_cds_from_genomic.fna.gz file using https://github.com/kblin/ncbi-genome-download . Yes I know, but I need functional annotation of the coding genes which is missing in refseq.

ADD REPLY
1
Entering edit mode

To my knowledge, the Integrated Microbial Genome database (link) is the only repository with functional annotation (KEGG, MetaCyc, COG, Pfam): example. The only problems are: 1) your search is limited to 500 genomes (if you are a registered user); 2) not every genome in RefSeq can be found in the IMG database.

Is there anything you can do to reduce the number of genomes and speed-up the annotation process? Do you really need 10k genomes? Finally, as alternative to eggNOG you should consider KofamScan (link); it is much faster than eggNOG

ADD REPLY
0
Entering edit mode

Your "functional annotations of the coding genes" are in RefSeq full reports, not in the sequences...

ADD REPLY
0
Entering edit mode

Can you refer me to that full report (even of specific file and I'll take it from there). I failed to find the data I'm looking for on all refseq files on FTP and through the record on ncbi, and this was my motivation to ask here. Thanks

ADD REPLY
0
Entering edit mode
3.5 years ago
Mensur Dlakic ★ 28k

Functional annotation is missing in those files you downloaded, but it is in *_protein.faa.gz files. That file type has protein sequences and for most of them there will be some kind of meaningful annotation in their FASTA headers. If you use the same program but download the files I indicated, it should work. To be certain, I suggest you download and unpack one of those .faa files, and see if the annotation in them is useful to you.

PS You may want to look at all the files from a given assembly directory, as all of them contain different types of information that may be helpful.

ADD COMMENT
0
Entering edit mode

Thanks! Actually faa.gz seems to have less annotation in their header than cds_from_genomic.fna.gz

ADD REPLY

Login before adding your answer.

Traffic: 1303 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6