Hey,
I would like to retrieve some exon sequences, translate them to amino acid sequences and then blast against some proteome.
I am working with exon sequences from Ensembl.
Ensembl uses something called phase to note the codons interrupted by introns, as follows:
Let N denote codon base that belongs to our exon of interest, # trailing codon bases in the same exon, and x intron.
Exon start phase: 0 - no interruption. NNNxxxxxxNNN###NNN###
Exon start phase: 1 - first codon's first base is in the previous exon. NxxxxxxxNN###NNN###NNN
Exon start phase: 2 - first codon's first two bases are in the previous exon. NNxxxxxxxxN##NNN###NNN
In addition to start phase, there is also an end phase, which works similarly.
Exon end phase:1 - last codon's last base is in the next exon. NNN###NNxxxxxxxN
Exon end phase:2 - last codon's last two bases are in the next exon. NNN###NxxxxxxNN
I assume these descriptions are correct - please let me know if they are not.
I downloaded phase information using BioMart to later map them back to the exon sequences and remove these interrupted codons. The problem is that BioMart provides single phase information, which I guess is the start phase. Does anyone know why the end phase is missing?
Thank you
sorry I think you are wrong- as you can see the end phase of one exon and start phase of next exon should be in frame (meaning same phase), your explanation would not justify that.
The position of an exon/intron boundary within a codon. A phase of zero means the boundary falls between codons, one means between the first and second base and two means between the second and third base. Exons have a start and end phase, whereas introns have just one phase. A boundary in a non-coding region has a phase of -1.