Entering edit mode
14 months ago
Charles Plessy
★
2.9k
In two different projects I need to modify annotation files. For instance I need to split a gene into two independent ones following evidence that they are separate transcriptional units. I also need to create a new alternative isoform of a gene with a longer 3′UTR. The source file for the annotations are in GFF or GTF format and the only tools that I can think about are emacs or vim. But isn't there a better tool, that would automagically take care that the gene, transcript and parent IDs stay consistent?
If you need a toolkit to work with GTF/GFF files check out
AGAT
: https://agat.readthedocs.io/en/latest/agat_for_you.htmlThanks! Unfortunately it turned out that RSEM (which I use through the nf-core RNA-seq pipeline) does not accept the GFF files output by AGAT, because it does not put spaces to separate the semicolon-separated tag-value fileds. RSEM is clearly wrong there, but the cost of not using a standard pipeline is too high for me. Fortunately a simple sed command fixes the problem.
It seems a bit like open-source genome editors have fallen out of fashion. Possibly because all the pipelines are so fully automated as one-off tools that nobody is expected to touch, not to mention attempt to manually curate all these genomes. I know:
Manatee is out of the question (updated 2007, Ubuntu 8 VM image ), and the others have not aged well; in particular are they way too cumbersome to install for the casual desktop user.
Possibly, the best option is to try out Apollo from a Docker container but using the simple setup you might not be able to save data. So, try the production docker setup maybe and evaluate carefully if it is worth bothering.