Thanks to everyone that responded. While I didn't get the answer I was hoping for, I realized that I didn't know enough about the matter to ask a proper question :). I've got more reading to do.
Not to beat around the bush, I'm writing a script that would compare RNAseq mapping to an annotation file to determine how "good" the mapping is (and maybe do some additional analyses). Thats why I'm studying RNAseq and annotation files to better understan what could occur.
If someone is up to it, I have a follow up question:
I'm looking at BED annotation file for S.Cerevisiae S S288c. There are two annotation enteries for transcripts YOR343W-A and YOR343W-A. This puzzles me a lot. They both start at the same position in the genome and on the same strand. That would mean that they relate to the same gene (namely YOR343W). However, transcript A has only one exon of length 1317, and transcript B has two exons, first of length 1294 and second of length 4019 with only one base in between.
Two things puzzle me here:
- Transcript B seems to have an intron of length 1
- The only exon of transcript A is 23 bases longer then the first exon of transcript B and starts at the same position. It actually contains a whole exon1, one skipped base and a small part (22 bases) of exon2 of transcript B.
There are several other pairs of annotations that look really similar to the one above (e.g. YOR192C)
I would be grateful if someone could briefly explain the mechanism behind these two transcripts. Is this alternate splicing, and which type?
This is pretty much it. There's a selective pressure due to NMD to have dedicated starting and finishing exons - more so for finishing exons than starting exons, but that's only my impression (I dont have numbers to back that up).
A few years ago I was asked to design several million primer pairs to detect all splice variants of all human/mouse/rat genes for some pharma company, and one of the requirements was that while the first primer pair could go anywhere on a splice-boundry, the second primer-pair must/should go somewhere in the last (3') exon. The theory was the last exon is usually transcribed for most/all splice variants. I don't know how true it is, but that's what they wanted.