I am trying to use the software MaxEntScan recently. I am kind of confused about its manual. I tried to contact the author, but it seems the email address is no longer available. So I'd like to post the question here to see if anyone has used it before.
On the web page, in order to use the software to score 5' splice sites, the input should be:
Each sequence must be 9 bases long. [3 bases in exon][6 bases in intron]
Input sequences as a FastA file with one sequence per line (no linebreaks). Non-ACGT sequences will not be processed.
Example Fasta File
> dummy1 cagGTAAGT > dummy2 gagGTAAGT > dummy3 taaATAAGT ...
I am confused here. The UTR 5' is at the beginning of an exon. Should the input sequence be [6 bases in intron] [3 bases in exon]?
Thank you for your reply, petebio. But even it refers to 5' end of the intron, Should it also be intron first, then exon? Because the sequence is intron+UTR 5'+exon. There is no intron after UTR 5'.
If you are looking at the 5' splice site it should be exon first, then intron. That way you can include the bases flanking the splice site which is located at the exon/intron boundary.
So in other words, you should include the last 3 bases at the 3' end of the exon, and the first 6 bases at the 5' end of the intron.
If there is no intron present then it does not make sense to look for a splice site there. You should concentrate only on cases where there is a exon/intron boundary present.
In your original post you mention that the 5' UTR is at the beginning of the exon? Are you actually looking at the 3' splice site here? Keep in mind that that UTR is part of the exon, since it is included in the final mRNA. So in these cases the splice site will be located at the boundary of the intron and the UTR.
Thank you. It really helps.