Downloading fasta sequence for a PDB entry

0

Entering edit mode

5.7 years ago

henriquezvera.95 • 0

I would like to know if it is possible to download the sequence FASTA of a pdb file using biopython

genome biopython • 3.8k views

ADD COMMENT • link updated 5.7 years ago by Sej Modha 5.3k • written 5.7 years ago by henriquezvera.95 • 0

0

Entering edit mode

[ Please read before posting a question ] -- How To Ask A Good Question - what have you tried so far?

You can use NCBI unix eutils

esearch -db protein -query '1REV[All Fields] AND pdb[filter]'|efetch -format fasta

ADD REPLY • link 5.7 years ago by Sej Modha 5.3k

0

Entering edit mode

There was a post some time ago:

How download a sequence fasta from PDB using biopython / python?

ADD REPLY • link 5.7 years ago by natasha.sernova ★ 4.0k

2

Entering edit mode

5.7 years ago

Sej Modha 5.3k

	#!/usr/bin/env python3
	# -- coding: utf-8 --
	"""
	Created on Thu Sep 5 16:54:59 2019

	@author: sejmodha
	"""
	from Bio import Entrez,SeqIO

	Entrez.email = "foo@bar.com"

	query=r'1REV[All Fields] AND pdb[filter]'

	handle=Entrez.esearch(db="protein", term=query)
	records=Entrez.read(handle)
	id_list=records['IdList']
	#print(id_list)
	handle.close()
	for each_id in id_list:
	fasta=Entrez.efetch(db="protein", id=each_id, rettype="fasta")
	fasta_record=SeqIO.read(fasta, "fasta")
	print(f'>{fasta_record.id}\|{fasta_record.description}\n{fasta_record.seq}')

view raw GetPDBFASTA.py hosted with ❤ by GitHub

ADD COMMENT • link 5.7 years ago by Sej Modha 5.3k

0

Entering edit mode

5.7 years ago

Joe 22k

Kind of a hacky solution (since it downloads the PDB first technically) but here's something you can use as a one-liner:

$ wget -O - https://files.rcsb.org/download/1A80.pdb 2>/dev/null \
   | python -c "import sys; from Bio import SeqIO; SeqIO.convert(sys.stdin, 'pdb-atom', sys.stdout, 'fasta')"
>1A80:A
TVPSIVLNDGNSIPQLGYGVFKVPPADTQRAVEEALEVGYRHIDTAAIYGNEEGVGAAIA
ASGIARDDLFITTKLWNDRHDGDEPAAAIAESLAKLALDQVDLYLVHWPTPAADNYVHAW
EKMIELRAAGLTRSIGVSNHLVPHLERIVAATGVVPAVNQIELHPAYQQREITDWAAAHD
VKIESWGPLGQGKYDLFGAEPVTAAAAAHGKTPAQAVLRWHLQKGFVVFPKSVRRERLEE
NLDVFDFDLTDTEIAAIDAMDPGDGSGRVSAHPDEVD

Just replace 1A80 in the wget link to whatever the PDB ID you're interested in is. BioPython doesn't have the ability to download the data inherently, so you need to pass it the file somehow. I've elected to do this in the shell, but you could also do this natively with python, but its more complicated (IMO).

If you want to save it as a file, stick a redirect to a file at the end of the command:

(previous command)... > pdbsequence.fa

ADD COMMENT • link 5.7 years ago by Joe 22k

Login before adding your answer.