I would like to download all hemagglutinin structures for influenza virus from the Protein Data Bank via a python script. I have looked through the PDB and BioPython PDB package on how to do this with no luck. Does anyone know if this is possible?
I would like to download all hemagglutinin structures for influenza virus from the Protein Data Bank via a python script. I have looked through the PDB and BioPython PDB package on how to do this with no luck. Does anyone know if this is possible?
You need to know all PDB IDs you want to download though and list them, the program will download them automatically. You have to go to PDB, search what you are interested in, select all IDs you think are relevant to you, go to Reports--->List selected IDs
import Bio
from Bio.PDB import PDBList
'''Selecting structures from PDB'''
pdbl = PDBList()
PDBlist2=['4B97','4IPH','4HNO','4HG7','4IRG','4G4W','4JKW','4IPC','2YPM','4KEI']
for i in PDBlist2:
pdbl.retrieve_pdb_file(i,pdir='PDB')
Have a look at the PDB's REST APIs, at their documentation and at this example python program provided in the same site.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How do i download the PDB IDs of an entire set of soluble enzymes from PDB and then select only the non membrane bound enzymes form the list while removing redundant sequences by keeping only the highest resolution sequences ?