Hi, I have a recently assembled bacterial genome. It consists of a a couple dozens scaffolds, and I have put them through three gene prediction methods: Prodigal, Glimmer, and GeneMark.
Now that I have three different gtf files describing putative genes, I want to somehow combine these results into a consensus list. I have been looking into such a combiner called JIGSAW and was wondering if anyone has experience using this software with bacteria (I think it was made with eukarylotes in mind).
I am running with the following command:
jigsaw -l -f "myGenome.fasta" -m "jigsaw.output" -e "myEvidenceFile"
And my evidence file looks like this:
scaffolds_GeneMark.gff gff geneprediction coding 1.0 scaffolds_Prodigal.gff gff geneprediction coding 1.0 scaffolds_Glimmer.gff gff geneprediction coding 1.0
Typically, JIGSAW wants the user to provide the type of exon that was annotated (start, internal, end, etc), but since this is a prokaryote, I thought maybe it was best just to use the "coding" identifier.
The problem is, for a particular contig that has an average of 10 genes predicted from each individual method (7 of which overlap perfectly between all three methods), JIGSAW is only predicting 2 genes!
Any comments or suggestions and much appreciated
Also posted at SEQanswers. Reading the JIGSAW paper, it seems very much geared towards eukaryotic genomes and may simply be inappropriate for bacteria.
If that's the case, does anyone have a recommendation for prokaryotic gene prediction combiners?