Hi,
I will describe an example from my sequencing data to make it easy for understand. Say, we captured some exons and sequenced by 454. So I have long reads ~ 300 bp long. When I did mapping, for some reads I have seen that half part of read is mapped to end of exon1 of a gene and half part is mapped to beginning exon2 of same gene. Now I have some explanations in my mind-
- May be there is intronic deletions in these parts.
- There is mRNA contamination (I want to know if it is possible).
- There is retrocopy inserted in genome of individual whose exons we have sequenced.
Are there any other explanations for this scenario and how can I make it sure that which case is true?
Broad found this as well and in the end, they excluded case 2 and believed this is your case 1. I forgot how/whether they excluded case 3, which seems to me the most likely.
Thanks. As you said that case 3 seems to be most likely, can you please suggest how should I decide between case1 and case3 (Bioinformatic approach :))
I guess easy things to check include: a) whether there are reads containing the intron; b) if there are copy number changes, though this is hard for exome sequencing.
Just so that you have all cases at hand, how about mapping error due to a repetitive region? I have not worked with exome sequenced data or with 454. However, even though the reads are long here (in comparison to RNA-Seq), is there a possibility that the place where you find the reads spliced are repetitive? Do you see a few reads mapping across exon-intron junction and others mapping across exon-exon junction or are the reads always spliced?