Hi! I have recently used Prokka on a set of bins, to annotate their genes. In order to make the dataset of annotated genes more manageable, I'd like to categorize all the genes, according to their function. I haven't decided, how narrow the categorization should be yet. I have been told that EggNOG-mapper should be able to do this, however, when i read about EggNOG, it looks like it also annotates genes. As that would be double work, I'd like to ignore that function and only categorize the already annotated genes. Is this possible? Or am I better off not running Prokka and just use EggNOG? (Maybe I'm wrong, but I have had a hard time interpreting their github page).
Thanks!
Ah that makes more sense! So all I have to provide to EggNOG, is the protein sequence file (.faa) from the Prokka output? The reason why I'm asking and not trying, is due to the vast databases I have to download. If EggNOG is not right, it would be a lot of work for nothing.
I used the diamond database for a quick shot and then refined with more specific databases
Whats the difference between HMMER and Diamond?
DIAMOND
- Accelerated BLAST compatible local sequence aligner https://github.com/bbuchfink/diamondHMMER
- biosequence analysis using profile hidden Markov models http://hmmer.org/Thanks!! So they seem kind of similar. Why would I use one over the other?
the hmmer version gave me more results, as far as I recall. Diamond was way faster though
I am currently using EggNOG with diamond now. My file is about 2MB. As of now, diamond has been running for 50 minutes with no results (on 4 cores). Do you recall if this is the usual time, or might there be an issue?
Depends what database you are running against?
My shell command looks like this: python emapper.py -i input-file --output output-file -m diamond.
It states that when using diamond, no database should be specified on EggNOGS github.