Hi Guys,
I have some assembled genomic sequences and I know the exact frames I need to translate to obtain coding sequence and putative protein sequences, but I do not know exactly where to chop the translated frame and hence the coding region and splice sites for intron and exon boundaries. Is there a way to get coding region from given frames? Please share your knowledge.
Thank you!
Please show what you have; if you know the exact frame, then that's where to cut!
Thank you for your reply, Karl. When I say assembled the genomic sequences are still unannotated and there are multiple scaffolds. Suppose I have three scaffolds for any particular protein coding gene and if I have to translate the scaffolds in two different frames each. After merging all the translated frames, I will have an unreasonably long peptide sequence/coding region. That is because the frame gets translated further beyond the 'GT' and 'AG' boundary and hence includes the non-coding regions as well. In this case I think I need to have transcriptomic sequences to unambiguously infer the coding regions. Please clarify if otherwise. Thanks again!