Given the following GFF3, where is the stop codon supposed to be:
scaffold1.1 maker gene 247127 258737 . - . ID=...
scaffold1.1 maker CDS 258659 258737 . - 1 ID=...
scaffold1.1 maker CDS 254856 254976 . - 2 ID=...
scaffold1.1 maker CDS 251358 251395 . - 1 ID=...
scaffold1.1 maker CDS 250084 250198 . - 2 ID=...
scaffold1.1 maker CDS 248687 248760 . - 1 ID=...
scaffold1.1 maker CDS 247127 247239 . - 0 ID=...
My reasoning so far has been:
- the last CDS is the one at 247127..247239 on the minus strand
- the because we are reading from right to left, the stop codon is at 247127..247130
- also because we are on the minus strand, we need to reverse complement 247127..247130
- the coordinates are 1-based, so I need to subtract 1 for each coordinate for any language that has 0-based indexes
Here's my confusion:
- at 247127..247130 the sequence is GAT, so it's a reverse (but not complemented) stop codon. Is that right?
- am I supposed to do something with the phase values?
Isn't the sequence denoted by 247127..247130 of length 4, not 3?
Indeed it is, apologies. See how these coordinates are driving me crazy? Harumph. I meant to say 247127.. 247129