Are gencode gtfs really one-indexed for the start- and zero-indexed for the end-coordinate?
1
3
Entering edit mode
10.1 years ago

I am looking at the schema for gencode gtfs and it claims the starts are one-indexed, but does not mention what indexing the ends use. I do not want to blithely assume that the ends are also 1-indexed, because the UCSC database dumps use 0-indexing for the start and one-indexing for the end so it might be that gencode uses the reverse format.

Does anyone know for sure? Sources or how you came to know/computed the answer would be nice.

gencode gtf • 5.0k views
ADD COMMENT
2
Entering edit mode
10.1 years ago

Both are one-indexed: http://www.ensembl.org/info/website/upload/gff.html

  1. start - Start position of the feature, with sequence numbering starting at 1.
  2. end - End position of the feature, with sequence numbering starting at 1
ADD COMMENT
1
Entering edit mode

I would just note it is not just the indexing but whether the interval is open ended or closed (inclusive) at the coordinates. These concepts do not mean the same thing - even though the UCSC may describe it that way, I think that just makes things more confusing. The UCSC coordinates are zero based, and open ended at the upper limit.

You are correct that the GTF is both indexed from 1 and also includes both coordinates that are listed.

ADD REPLY

Login before adding your answer.

Traffic: 1766 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6