Hi all,
I am doing an assembly of a non-model organism transcriptome. I assembled the RNA-seq reads with Trinity (genome-guided assembly) and to get the gff, I mapped the fasta output from Trinity to the reference genome using GMAP. The gff output has MANY instances where even if the direction is specified as 'sense', the sign in the direction column is '-'. Same happens when the direction is 'antisense'. I thought it was weird because that seems to happen pretty much exactly half of the times. Here is an example:
LQNS02276481.1 phaw gene 17436406 17487190 . + . ID=TRINITY_GG_63141_c0_g1_i1.path1;Name=TRINITY_GG_63141_c0_g1_i1;**Dir=antisense**
I checked, and the correct direction seems to always be in 'Dir=', not in the direction column (the 7th column here, '+')
The command I ran was:
gmap -d phaw --gff3-add-separators=0 -f 2 -n 1 Trinity-GG.fasta > gmap_phaw.gff3
GMAP version 2020-06-01 called with args: gmap.sse42
Very short transcripts (1 exon transcripts) are very difficult to predict the orientation of.
Try taking longer 2-3 exon transcripts. Check orientation, does it make sense with respect to the ATG, exons etc ?
Visualize the GMAP gff3 and compare with existing annotated data in a web browser. Does it fit ?