Hello, Im working on a metagenomic analysis where I need a reference set consisting of ~1500 prokaryotic genomes (in fasta format) from MetaHit and GenBank. My aim is to annotate the genes in this genomes with eggNOG orthologous groups.
From the eggNOG download site I think the dataset "Bacteria non-supervised orthologous groups (bactNOG) and their proteins" seems suitable for my annotation (flatfile).
The dataset looks like this:
## Protein name start_position end_position orthologous_group orthologous_group description
9685.ENSFCAP00000011039 1 363 meNOG04000 Leucine-Rich repeat protein SHOC-2
7159.AAEL014718-PA 5 527 meNOG04000 Leucine-Rich repeat protein SHOC-2
9606.ENSP00000352411 1 582 meNOG04000 Leucine-Rich repeat protein SHOC-2
7719.ENSCINP00000009123 9 531 meNOG04000 Leucine-Rich repeat protein SHOC-2
My question is: How I can perform this annotation, how I can annotate my set of prokaryotic genoms with this informations? Or I need also take into account the file "Protein sequences of all species, with the eggnog protein name."?
Thank you!
Thank you, this will help me a lot!