Download peptide sequences from NCBI using python
2
0
Entering edit mode
2.0 years ago
יובל • 0

I would like to extract the peptide sequence of the following: NM_021969.2 from the NCBI website as shown in this link: https://www.ncbi.nlm.nih.gov/nuccore/NM_021969.2

I was able to extract the nucleotide sequence using the following script, but I am unable to extract the following peptide sequence:

MSTSQPGACPCQGAASRPAILYALLSSSLKAVPRPRSRCLCRQH                  RPVQLCAPHRTCREALDVLAKTVAFLRNLPSFWQLPPQDQRRLLQGCWGPLFLLGLAQ                     DAVTFEVAEAPVPSILKKILLEEPSSSGGSGQLPDRPQPSLAAVQWLQCCLESFWSLE                     LSPKEYACLKGTILFNPDVPGLQAASHIGHLQQEAHWVLCEVLEPWCPAAQGRLTRVL
LTASTLKSIPTSLLGDLFFRPIIGDVDIAGLLGDMLLLR

Python script:

Entrez.email = 'myemail@gmail.com'

handle = Entrez.efetch(db='nuccore', id='NM_021969.2', rettype='fasta')

print(handle.read())

I would appreciate some help if anyone has succeeded.

Yuval

ncbi python biopython • 1.3k views
ADD COMMENT
2
Entering edit mode
2.0 years ago
iraun 6.2k

Hi! Welcome to Biostars :).

Try this:

Entrez.efetch(db="protein", id='NM_021969.2',  rettype="fasta")
ADD COMMENT
0
Entering edit mode

Thanks for your help, it solved my problem.

ADD REPLY
0
Entering edit mode

A small educational note: if an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

upvote_bookmark_accept

ADD REPLY
0
Entering edit mode
2.0 years ago
MirianT_NCBI ▴ 760

Hi,
You can use NCBI Datasets. To download the protein sequence associated with this nucleotide record, you can use the following command:

datasets download gene accession NM_021969.2 --include protein

This command will download a zip file, with the following contents:

ncbi_dataset
`-- data
    |-- data_report.jsonl
    |-- dataset_catalog.json
    `-- protein.faa

By default, NCBI Datasets gene data package includes transcript and protein sequences, as well as metadata as JSON-Lines. You can include other files (if available) using the flag --include, as exemplified above.

Feel free to reach out if you have any additional questions. I hope it helps :)

ADD COMMENT

Login before adding your answer.

Traffic: 2158 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6