Hi all, I'm quite new for genome annotation. So far I got an annotation lifted from the reference genome by liftoff, an annotation predicted by Braker (with hints from short-read RNA seq, reference protein, and long-read RNAseq), and an gtf transcripts file generated by long-reads analysis (SQANTI3 filtered results). I would like to merge all of these annotation together, in which all the unique transcripts will be kept.
So my 1st question is whether it's proper to do this merge, considering that Braker results already combine all of the hints and that the other two tools (liftoff & sqanti) use the same evidence? My reason to do so is that Braker use tsebra to select transcript models in the end, which may neglects some of the evidence that are maintained in the other tools. So from what I can see, liftoff results have 3000 more transcripts than Braker results after tsebra selection, and SQANTI3 filtered results still have 60 more new genes than Braker/tsebra results.
My 2nd question is, if I would like to do this merge, what tools can I use? I checked gffcompare and Tama_merge, and it seems like these tools only merge the transcripts together, especially after gffcompare I only got exon info left in my gff file (no CDS info at all). I haven't tried Tama_merge yet, because it need a lot of file format conversion. Could you please give me some suggestions?
Also, after merge, I would like to keep the gtf/gff file with the annotation format, not the transcriptom format. I mean I would like the file have a line for gene, a line for transcript, and then all the exon, cds and other stuff for the transcript. How can I do that?
Last but not least, I would like to keep the gene name of the final annotation file same as in the reference. How can I do this? Shall I use Blast for this purpose?
Thanks a lot!
Thanks!! The AGAT is very useful!