Question

Finding the transcript with UniProt canonical sequence (or ID)

1

Entering edit mode

9.4 years ago

ajingnk ▴ 130

I have UniProt canonical sequence and UniProtID. Is there any easy way (like Biopython) to get the transcript sequence for UniProt protein sequence (I also have gene name and UniProtID)? I saw many corresponding Ensemble Transcript for one UniProt entry. I just need the one corresponding to canonical sequence. Is there any easy way to find that?

Thanks

uniprot sequence • 3.6k views

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.4 years ago by ajingnk ▴ 130

Ram · Answer 1 · 2015-06-29

0

Entering edit mode

9.4 years ago

Elisabeth Gasteiger ★ 2.4k

This is unfortunately not possible in most cases: The canonical protein sequence is the outcome of thorough curation work, which often involves the merge of various sequences encoded by the same gene (in one species). For more details please have a look at http://www.uniprot.org/help/canonical_nucleotide

ADD COMMENT • link updated 23 months ago by Ram 44k • written 9.4 years ago by Elisabeth Gasteiger ★ 2.4k

0

Entering edit mode

Thanks Elisabeth! I saw that "Sequence databases" are listed as cross-reference. Is that possible to retrieve the nucleotide sequence with sequence databases? I just want to get some rough results, so I may not need all corresponding residues. I want to know the all possible mutations for one protein from a transcript.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.4 years ago by ajingnk ▴ 130

0

Entering edit mode

Sorry for not replying earlier.

You can indeed follow the link to EMBL/GenBank/DDBJ, but the problem is that there can be more than one such link and you may have difficulty choosing one, as described in the help document cited above. But if you do not mind which nucleotide sequence(s) to retrieve, you can of course use the protein_id or nucleotide accession number from the cross-referenced sequence databases to retrieve the sequence from the nucleotide sequence databases. In case of doubt, I suggest that you contact EMBL/GenBank/DDBJ directly about the best way to access their data programmatically.

ADD REPLY • link updated 23 months ago by Ram 44k • written 9.4 years ago by Elisabeth Gasteiger ★ 2.4k