This is a revised version of an earlier query that may not have been stated very clearly:
I have noticed a mismatch between the coordinates given by ExonStart / ExonEnd and exon range from the UCSC genome browser's annotation of hg19 human reference genome.
Specifically, the exonStarts and exonEnds coordinates that are given do not match the exon range given when the sequences are called. Typically, the exonStarts coordinate is 1 nucleotide prior to the exonStarts, as in the example below:
name chromosome strand exonStarts exonEnds exonFrame
NM_030806 chr1 + 184559872 184559949 1
While the range is
>hg19_refGene_NM_030806_2 range=chr1:184559873-184559949 5'pad=0 3'pad=0 strand=+ repeatMasking=none
GAAAAAAGTGCCAGCTCAAATGTAAGACTTAAAACTAATAAAGAGGTTCCGGGATTAGTTCATCAACCCAGAGCAAA
Usually the mismatch between exonStarts and range is +1 nucleotide, but sometimes it is more than this. What is the reason for the discrepancy between range and exonStarts/exonEnds, and which number is the actual coordinate of the first nucleotide in the exon?
Thanks.
However, if that were the case, wouldn't both the exonStarts and exonEnds be -1 with respect to the FASTA coordinates? Instead, the exonStart is -1 with respect to fasta, while the exonEnds match.
Also, which of the coordinate systems (0 start or 1 start) is consistent with ensembl coordinates?
I updated my answer to include more info about that.
Thanks for the update. I assume that the coordinates in ensembl annotation start at 1?