Entering edit mode
4.7 years ago
Juke34
8.9k
Question that comes up often, many solutions are available but it is difficult to understand how they differ. It exists several GTF versions and it is not always clear which GTF version the converters produce.
In the same time I have implemented my own converter and wanted to compare it with the others.
So, I decided to make a quick benchmark available here:
https://github.com/NBISweden/GAAS/blob/master/annotation/knowledge/gff_to_gtf.md
If you want to contribute you are welcome.
Not directly on topic but I will say that all these file formats should be killed off for good. BED, GFF, wiggle, bedgraph, SAM etc all should be replaced because they are nothing more than ill-defined ad-hoc conventions turned into "standards". (other than SAM, which is a designed by a shortsighted committee)
But I know it would be hard to make it work without a mandate from the top - funding agencies, repositories etc.
There is this elemental misunderstanding that permeates the entire bioinformatics, where the column number leaks into the data. A value in column 1 has a special meaning because it is column 1. Yet we still don't know what exactly is in that column.
Instead, there should the column header
start
,end
that should be standardized so that when it saysstart
you and every tool would know what the column contains: one-based, leftmost coordinate. If someone wants to store data that goes from 0 that would need to be calledstart0
etc Now a tool can ask if a file that contains certain columns rather asking for GTF2 vs GFF3 format. All other cases are simple to "convert". Just stick a header on top.Bioinformatics is hamstrung and bottlenecked by these ridiculously ineffective and fundamentally misdesigned formats, we ought to put of their misery (SAM/BAM included).
completely agree!
Hey Juke34,
augustus prints gff3 with UTR regions as 3'-UTR and 5'-UTR, how do I go about it??
If it is not the term expected by the GTF format, AGAT should automatically convert them to the proper one.
Woww. Thats great. Thanks for the super fast reply.