Gff3 Coordinate: Find Stop Codon On - Strand
1
2
Entering edit mode
11.0 years ago
Rvosa ▴ 580

Given the following GFF3, where is the stop codon supposed to be:

scaffold1.1     maker   gene    247127  258737  .       -       .       ID=...
scaffold1.1     maker   CDS     258659  258737  .       -       1       ID=...
scaffold1.1     maker   CDS     254856  254976  .       -       2       ID=...
scaffold1.1     maker   CDS     251358  251395  .       -       1       ID=...
scaffold1.1     maker   CDS     250084  250198  .       -       2       ID=...
scaffold1.1     maker   CDS     248687  248760  .       -       1       ID=...
scaffold1.1     maker   CDS     247127  247239  .       -       0       ID=...

My reasoning so far has been:

  • the last CDS is the one at 247127..247239 on the minus strand
  • the because we are reading from right to left, the stop codon is at 247127..247130
  • also because we are on the minus strand, we need to reverse complement 247127..247130
  • the coordinates are 1-based, so I need to subtract 1 for each coordinate for any language that has 0-based indexes

Here's my confusion:

  • at 247127..247130 the sequence is GAT, so it's a reverse (but not complemented) stop codon. Is that right?
  • am I supposed to do something with the phase values?
gff3 coordinates strand codon • 3.6k views
ADD COMMENT
0
Entering edit mode

Isn't the sequence denoted by 247127..247130 of length 4, not 3?

ADD REPLY
0
Entering edit mode

Indeed it is, apologies. See how these coordinates are driving me crazy? Harumph. I meant to say 247127.. 247129

ADD REPLY
0
Entering edit mode
11.0 years ago

Your reasoning is correct and the last codon should be the stop codon if the sequence is reverse complemented.

Also note that the last three bases will be 247127, 28 and 29 and you should not include 30!

The phase indicates how many bases of the current CDS will complete the codon that started in the previous CDS. It does not affect the stop codon.

ADD COMMENT
0
Entering edit mode

Thank you very much for your reply, this is the first time where the 'phase' thing is starting to make sense. So is it then the case that, if we have only two CDSs in the same gene, then the phase of cds2 is going to be length(cds1) % 3?

ETA: if that's how it works then I can also see that the phase can't affect the stop codon, because for the stop codon we're just counting "backwards" from the last position.

ADD REPLY
0
Entering edit mode

yes, where % means the remainder after division

ADD REPLY

Login before adding your answer.

Traffic: 1892 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6