Recover from protein ID the corresponding nucleotide ID .
2
0
Entering edit mode
5.1 years ago
Chvatil ▴ 130

Hello, does anyone know an automated way from an ID protein sequence to recover an ID and/or nucleotide sequence ?

For exemple I have the protein ID sequence :

YP_009344822.1

which is :

>YP_009344822.1 transcription associated protein [Pea leaf distortion virus]
MRSSSPSKDHYTQVPIKVQHREAKKRNRRRRVDLECGCSYFLSLNCFNHGFTHRGTHHCSSSMEWRLYLG
SSKSPLFQDPQPRQPSIHDEHGHHQDQDPIQLQPSESSGSAQVFSDLPNLDDLTPSDWSFLKSIQNPSPQ
VSHKSGCNLN

and I would like to recover its nucleotide ID : NC_033554.1

 >NC_033554.1:c1601-1149 Pea leaf distortion virus clone N36-41 segment DNA-A, complete sequence
ATGCGATCTTCATCACCCTCGAAGGACCACTATACTCAGGTTCCAATCAAAGTACAGCACAGGGAAGCGA
AGAAGCGCAACAGGAGGAGGAGAGTCGATCTTGAATGCGGGTGTTCTTATTTTCTATCTCTAAACTGCTT
CAACCATGGATTTACGCACAGGGGGACCCATCACTGCAGCTCAAGCATGGAGTGGCGCCTATATCTGGGA
AGTTCCAAATCCCCTCTATTTCAAGATCCTCAGCCACGACAACCGTCCATTCACGATGAACATGGACATC
ATCAAGATCAGGATCCAATTCAACTACAACCTTCGGAGAGCTCTGGGAGTGCACAAGTGTTTTCTGACCT
ACCGAATCTGGACGACCTTACACCCTCAGACTGGTCTTTTCTTAAAAGTATTCAAAACCCAAGTCCTCAA
GTATCTCACAAATCTGGGTGTAATCTCAATTAA

Thank you for your help.

fasta python ncbi • 770 views
ADD COMMENT
0
Entering edit mode
5.1 years ago
GenoMax 147k

Use NCBI EntrezDirect:

$ esearch -db protein -query "YP_009344822.1" | elink -target nuccore | efetch -format acc
NC_033554.1
ADD COMMENT
0
Entering edit mode
5.1 years ago

Hi,

the most easy way I've found to obtain the nucleotide sequence for the protein with its ID is to go for efetch (from Entrez Direct).

efetch -db protein -id YP_009344822.1 -format fasta_cds_na
>lcl|NC_033554.1_cds_YP_009344822.1_1 [gene=TrAP] [locus_tag=BZL59_gp4] [db_xref=GeneID:30906012] [protein=transcription associated protein] [protein_id=YP_009344822.1] [location=complement(1149..1601)] [gbkey=CDS]
ATGCGATCTTCATCACCCTCGAAGGACCACTATACTCAGGTTCCAATCAAAGTACAGCACAGGGAAGCGA
AGAAGCGCAACAGGAGGAGGAGAGTCGATCTTGAATGCGGGTGTTCTTATTTTCTATCTCTAAACTGCTT
CAACCATGGATTTACGCACAGGGGGACCCATCACTGCAGCTCAAGCATGGAGTGGCGCCTATATCTGGGA
AGTTCCAAATCCCCTCTATTTCAAGATCCTCAGCCACGACAACCGTCCATTCACGATGAACATGGACATC
ATCAAGATCAGGATCCAATTCAACTACAACCTTCGGAGAGCTCTGGGAGTGCACAAGTGTTTTCTGACCT
ACCGAATCTGGACGACCTTACACCCTCAGACTGGTCTTTTCTTAAAAGTATTCAAAACCCAAGTCCTCAA
GTATCTCACAAATCTGGGTGTAATCTCAATTAA

And do some post-processing to obtain the header in the format you want.

ADD COMMENT

Login before adding your answer.

Traffic: 2880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6