When downloading the annotation for a genome from Ensembl, there's a GTF and a GFF3 file available. When reading the README files for these two, I'm having trouble determining if these are exactly the same information just in different formats, or if there's a difference in the actual annotations between the two. The wording makes it sound like GFF3 file might include some non-gene features that aren't included in the GTF, and that possibly they have different requirements for evidence to include a gene in each of the files. Does anyone know exactly what the differences are?
Did you check this? You will get a primary idea of what these file types contain.
That's describing the differences in the formats, which I'm very familiar with. What I'm asking about is whether the Ensembl genome annotations in the two formats contain the exact same gene and feature sets, or if there's a difference in what is included in each. It's a question about Ensembl's data procedures, not the file formats.
Isn't this answerable from downloading relevant pairs of files and comparing?
It's not trivial, but doable, to determine whether they're the same. If they're different, I'm not sure how I'd discern what criteria Ensembl used to generate the two.
I don't know if this case is the norm or the exception, but while working with S.cerevisiae annotation I noticed that GFF3 had entries for whole chromosomes as a feature, while GTF did not.