I used rMATS tools on my nam files and now have a list of retained introns with their upstreamEE and downstreamES information and geneID information. I want to test the hypothesis that introns that are in-frame (I already have this information - if the length of an intron is dividable by 3 then it is in-frame) would be translated into hydrophobic amino acids or not. ‘Hydrophobic’ could be defined as the percentage of amino acids that are hydrophobic relative to the percentage of all amino acids in those genes.
I am very new to these types of analyses so it would be great to get some input on to perform this analysis.
How I can also take this into consideration that the intron boundaries do not need to coincide with the codon usage?
Any input is appreciated. Thank you.