I realize that given an exon start and stop position, the reading frame CANNOT be inferred by starting at the start position making codons.
So my question is how to computationally determine the reading frame of an exon given the START and END positions?
Specifically I am using the output from rMATS for skipped exons.
Upstream Exon ---- Skipped Exon ---- Down Stream Exon
This is the alternative splicing event detected. There will be six genomic positions: start and ends for each exon.
So is this enough information to determine the reading frame of the transcript?
My initial thoughts would be to
- Code out the 3 possible reading frames of the upstream exon (the strand is given in rMATS)
- For the gene in question, match the reading frame in 1. to the protein sequence
- For the matched reading frame, extend to the skipped exon
But I feel this is a naive approach and possibly someone way smarter than me figured out a clever solution.
Do you have an example because I tried using blastx on my skipped exon sequence and it didn't return anything. The exon is annotated in a GTF in RefSeq which is odd.