Entering edit mode
5.1 years ago
Juke34
9.2k
You could decide to use the evidence.gff and prothint.gff files from ProtHint.
In my case I already have an annotation (MAKER evidence-based). So I decided to use the annotation I had.
But it cannot used like that because Genemark uses Intron, start_codon, stop_codon features, which are absent from the MAKER annotation gff file.
So here are the steps to cheat successfully:
Prerequisite: AGAT
# add start and stop codons
agat_sp_add_start_and_stop.pl --gff maker_annotation.gff --fasta genome_sm.fa -o maker_annotation_startstop.gff
# add introns
agat_sp_add_introns.pl --gff maker_annotation_startstop.gff -o maker_annotation_startstop_introns.gff
# remove useless features
awk '{if($3=="intron" || $3=="start_codon" || $3=="stop_codon") print $0}' maker_annotation_startstop_introns.gff > maker_annotation_startstop_introns_only.gff
# replace intron by Intron (Otherwise Genemark fails)
sed -i 's/ intron / Intron /' maker_annotation_startstop_introns_only.gff
# add al_score attribute with value over 0.3 otherwise intron features are thrown away
awk '{print $0";al_score=1"}' maker_annotation_startstop_introns_only.gff > maker_annotation_startstop_introns_only_al_score_tag.gff
Now you can run GeneMark:
/path/to/genemark/gmes_petap.pl \
--evidence maker_annotation_startstop_introns_only_al_score_tag.gff \
--training -v \
--sequence genome_sm.fa \
-cores 16 \
--EP maker_annotation_startstop_introns_only_al_score_tag.gff