Hi,
The following piece of code uses Biopython to split all models from a valid PDB file into individual files.
It takes a PDB-formatted filename and a PDB code as input (arguments -f and -p respectively) and writes each existing model to a different output PDB file (with filename format "PDBcode"-"model".pdb).
My concern is that BioPython always numbers models starting from a dummy 0, so it is not directly possible to use the real model numbers in the name of the splitted files. (We can always add 1 to these numbers, but what if the PDB does not start from model 1, or model numbers are not consecutive?).
Is there any BioPython-compatible way of getting the actual model numbers from a PDB file and use these as part of the filenames for the spllited individual models?
Thanks!
#!/usr/bin/python
import sys, getopt
from Bio.PDB import PDBParser, PDBIO
def main(argv):
# Parse input arguments
pdbfile = ''
pdbid = ''
try:
opts, args = getopt.getopt(argv,"hf:p:",["pdbfile=","pdbid="])
except getopt.GetoptError:
print 'split_pdb_models.py -f <pdbfile> -p <pdbid>'
sys.exit(2)
# Process input arguments
for opt, arg in opts:
if opt == '-h':
print 'split_pdb_models.py -f <pdbfile> -p <pdbid>'
sys.exit()
elif opt in ("-f", "--pdbfile"):
pdbfile = arg
elif opt in ("-p", "--pdbid"):
pdbid = arg
# Split models from PDB into individual files
pdb = PDBParser(QUIET=True)
structure = pdb.get_structure(pdbid,pdbfile)
for model in structure:
io.set_structure(model)
io.save(structure.get_id() + "-" + str(model.get_id()) + ".pdb")
print 'Input file is:', pdbfile
if __name__ == "__main__":
io = PDBIO()
main(sys.argv[1:])