Its a very naive question but i certainly lack some clarity here. I have contigs from megahit and would like to perform functional annotation.
For this purpose i first used prodigal for gene prediction and the prokka for annotation.
My question here is if I would like to do functional annotation on my own instead of using prokka. What do I have to do? I mean which tools and references?
Prokka is an amazing software. From convenience of use to results it has never failed me. It is a collection of different softwares and the best place to start (if you don't want to use Prokka) would be Prokka itself i.e looking at what softwares Prokka uses to get annotations. I hope I am making sense here.
Thank you for the reply.
Question: 1. Can prokka take all kingdom like this --kingdom 'Archaea|Bacteria|Mitochondria|Viruses' ?
2. Can I also use prokka on bins from Maxbin2 output? If yes then, should it be called on each bin similar to how it is called on contigs?
Seems like you already have binned your sequences. If so, the next step is to annotate the bins for completeness and assign them into general taxonomic categories (if possible). A tool for that is CheckM. Its output looks something like this:
You should probably copy and paste the lines above into a wider screen so you can read them properly. Anyway, it shows that the first two bins in this sample are Archaea and the next one is Bacteria, so you can use that to specify the kingdom using Prokka. I don't think you can tell Prokka to look at all kingdoms.
My recommendation is to annotate using Prokka rather than manually. We are talking here 10 minutes vs. many hours or even days, and I am still not sure that manual annotation would be more successful. If you truly feel that your sequence annotation ability is much better than that of Prokka, you can always continue from Prokka annotation and tackle uncharacterized proteins.
You recommend prokka as well!
Question: Should your recommend to contatenate all checkM bins to create one single file which can then be used as a input for prokka? or Do you run prokka on individual bins?
What is recommended to classify bins for taxonomy?
Prokka should run on individual bins. If you concatenate them, it would be the same as running it on the whole metagenomic assembly.
After binning, create .fasta files for each bin and put them in the same directory. In the example above they were named group_00000X.fasta. After you run CheckM according to their instructions, the second column of the output (see above) will be the taxonomic classification. That may be only at the level of kingdom, or go all the way down to genus. Either way, it will provide enough information so you can assign kingdom in Prokka.
For some bins there will be no annotation because CheckM only annotates prokaryotes. Those bins could be viruses, eukaryotes, or short contigs that can't be annotated conclusively.
This is really of great help. I am taking oppurtunity to ask more :) Actually CheckM provides very low resolution classification and I order to have better resolution I was running blastn with Nt database on each combined bin (as shown below). Do you think its a good idea to do so? Or can you recommend something better.
The degree of annotation granularity by CheckM will depend on your sample. For example, c__Thermoprotei (UID147) is a very clear-cut annotation, and k__Archaea (UID146) obviously is less so. Getting a better resolution depends on how much time you want to spend and for what purpose. For example, if you want to know just for internal use what the most likely annotation is, you could blast 5-10 largest contigs against the NT database and see if there is some kind of consensus regarding the best match. If top hits for all of them are the same and the identity is fairly high, that will probably do the trick.
To publish your finding, you will have to be more rigorous. There are many ways to do it, but you can start with 16S rRNA (if available), build a tree with a representative set of species and see where yours is slotted. The same can be done with a concatenated set of proteins. Pick a paper from a reputable journal describing an annotation of a new species from metagenomic data and most of these steps will be described in greater detail.
Thank you for detailed explanation. Its was really helpful. As you mentioned to publish one need more rigorous analysis. Can you suggest any good paper showing this section in detail?
Hello!
I am running some metagenomic data on Galaxy server. I assembled the paired-end reads into contigs and scaffolds through metaSPAdes, which have been then clustered into bins with MaxBin2. I assessed quality of these bins with CheckM. Now I want to annotate these bins using Prokka but a fatal error keeps appearing:
Argument "1.7.8" isn't numeric in numeric lt (<) at /usr/local/bin/prokka line 259.
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/corral4/main/jobs/040/757/40757985/_job_tmp -Xmx28g -Xms256m
Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/corral4/main/jobs/040/7
I uploaded the bins in fasta format so I dont know which could be the problem.
Thank you
Most likely you need to install the newest prokka version, as this seems to be the problem with recognizing program versions. While you are at it, I suggest you make sure that all your other programs needed by prokka are up to date. Lastly, your question has nothing to do with the original post, and should be posted in the separate thread rather than as a comment here. Only people who are already in this almost 3-year-old thread are likely to see a question, so you are limiting your audience.
Prokka is an amazing software. From convenience of use to results it has never failed me. It is a collection of different softwares and the best place to start (if you don't want to use Prokka) would be Prokka itself i.e looking at what softwares Prokka uses to get annotations. I hope I am making sense here.
Thank you for the reply. Question: 1. Can prokka take all kingdom like this --kingdom 'Archaea|Bacteria|Mitochondria|Viruses' ? 2. Can I also use prokka on bins from Maxbin2 output? If yes then, should it be called on each bin similar to how it is called on contigs?