How to obtain Coding sequence from Genomic sequence?
1
0
Entering edit mode
10.4 years ago
MAPK ★ 2.1k

Hi Guys,

I have some assembled genomic sequences and I know the exact frames I need to translate to obtain coding sequence and putative protein sequences, but I do not know exactly where to chop the translated frame and hence the coding region and splice sites for intron and exon boundaries. Is there a way to get coding region from given frames? Please share your knowledge.

Thank you!

coding genome • 2.5k views
ADD COMMENT
0
Entering edit mode

Please show what you have; if you know the exact frame, then that's where to cut!

ADD REPLY
0
Entering edit mode

Thank you for your reply, Karl. When I say assembled the genomic sequences are still unannotated and there are multiple scaffolds. Suppose I have three scaffolds for any particular protein coding gene and if I have to translate the scaffolds in two different frames each. After merging all the translated frames, I will have an unreasonably long peptide sequence/coding region. That is because the frame gets translated further beyond the 'GT' and 'AG' boundary and hence includes the non-coding regions as well. In this case I think I need to have transcriptomic sequences to unambiguously infer the coding regions. Please clarify if otherwise. Thanks again!

ADD REPLY
2
Entering edit mode
10.2 years ago
Manvendra Singh ★ 2.2k

There is an easy ways to do it .

It's Coding Potential Calculator (CPC).

It takes input as fasta sequence.

Once you submit your fasta file, you get CP score for each sequences. >1 is potential protein coding and < -1 is potential non-coding RNA

CPC can reliably discriminate the coding and non-coding transcripts in ~98% accuracy.

ADD COMMENT

Login before adding your answer.

Traffic: 1551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6