I used minimap2 to align a de novo transcriptome file to a reference genome.
With samtools I converted the minimap2 output to bed and wrote my own script to create the gff which will be provided to maker est_gff
According to this maker-devel topic: https://groups.google.com/g/maker-devel/c/2j9NWwl-4xY The alignment gff from minimap2 needs to follow the alignment format used by GFF3 (i.e. match/match part)
I run three sample tests of maker to check the gff I created from minimap2:
- I include protein sequences (fasta) and mRNA sequences without transcriptome to use as a reference for test no. 2
- I Include both protein and mRNA sequences and provided the est_gff that was created from minimap2
- I Include all sequences (proteins, mRNA, transcriptome) and let maker use BLAST for all alignments.
When I compared the final gff files from tests 1 & 2 the results were identical. I checked the presence of est_gff input in test 2 and the file did contain alignments from minimap2:
scaffold15014-5 est_gff:minimap2 expressed_sequence_match 275244 275456 1000 + . ID=scaffold15014-5:hit:10067:3.12.0.2;Name=TRINITY_DN110156_c0_g2_i1;score=1000
scaffold15014-5 est_gff:minimap2 match_part 275244 275456 1000 + . ID=scaffold15014-5:hsp:16084:3.12.0.2;Parent=scaffold15014-5:hit:10067:3.12.0.2;Target=TRINITY_DN110156_c0_g2_i1 1 213 +;Gap=M213
I think it means that maker did not reject the format I provided, but for some reason he did not use it to provide the hints based annotation predictions. The minimap2 gff I provided to maker est_gff looks like:
scaffold15014-5 minimap2 expressed_sequence_match 103440 103740 1000 + . ID=scaffold15014-5:TRINITY_DN55863_c2_g1_i1:hit:103440-103740;Name=TRINITY_DN55863_c2_g1_i1
scaffold15014-5 minimap2 match_part 103440 103595 1000 + . ID=scaffold15014-5:TRINITY_DN55863_c2_g1_i1:hit:103440-103740:hsp:1;Parent=scaffold15014-5:TRINITY_DN55863_c2_g1_i1:hit:103440-103740;Target=TRINITY_DN55863_c2_g1_i1 1 156;
scaffold15014-5 minimap2 match_part 103635 103740 1000 + . ID=scaffold15014-5:TRINITY_DN55863_c2_g1_i1:hit:103440-103740:hsp:2;Parent=scaffold15014-5:TRINITY_DN55863_c2_g1_i1:hit:103440-103740;Target=TRINITY_DN55863_c2_g1_i1 157 262;
Thanks for consideration and help.
Thank you! The results were much better using AGAT.