Get real model numbers from PDB using BioPython
1
1
Entering edit mode
4.7 years ago
NPalopoli ▴ 290

Hi,

The following piece of code uses Biopython to split all models from a valid PDB file into individual files.

It takes a PDB-formatted filename and a PDB code as input (arguments -f and -p respectively) and writes each existing model to a different output PDB file (with filename format "PDBcode"-"model".pdb).

My concern is that BioPython always numbers models starting from a dummy 0, so it is not directly possible to use the real model numbers in the name of the splitted files. (We can always add 1 to these numbers, but what if the PDB does not start from model 1, or model numbers are not consecutive?).

Is there any BioPython-compatible way of getting the actual model numbers from a PDB file and use these as part of the filenames for the spllited individual models?

Thanks!

#!/usr/bin/python
import sys, getopt
from Bio.PDB import PDBParser, PDBIO

def main(argv):
  # Parse input arguments
  pdbfile = ''
  pdbid = ''
  try:
    opts, args = getopt.getopt(argv,"hf:p:",["pdbfile=","pdbid="])
  except getopt.GetoptError:
    print 'split_pdb_models.py -f <pdbfile> -p <pdbid>'
    sys.exit(2)

  # Process input arguments
  for opt, arg in opts:
    if opt == '-h':
      print 'split_pdb_models.py -f <pdbfile> -p <pdbid>'
      sys.exit()
    elif opt in ("-f", "--pdbfile"):
      pdbfile = arg
    elif opt in ("-p", "--pdbid"):
      pdbid = arg

  # Split models from PDB into individual files
  pdb = PDBParser(QUIET=True)
  structure = pdb.get_structure(pdbid,pdbfile)
  for model in structure:
    io.set_structure(model)
    io.save(structure.get_id() + "-" + str(model.get_id()) + ".pdb")
  print 'Input file is:', pdbfile

if __name__ == "__main__":
  io = PDBIO()
  main(sys.argv[1:])
structure biopython proteins python2 • 1.6k views
ADD COMMENT
2
Entering edit mode
4.7 years ago
jgreener ▴ 390

Yes, you can use the property "model.serial_num" to get the model number that was actually in the file (with no off-by-one shift), though I'm not sure this is documented anywhere.

See https://github.com/biopython/biopython/blob/master/Bio/PDB/PDBParser.py#L326-L334 and https://github.com/biopython/biopython/blob/20cb4d67adec136dcbe9bf01b29d56d39cdaa513/Bio/PDB/StructureBuilder.py#L69-L78 for where this happens in the code. It can also be seen in https://github.com/biopython/biopython/blob/1a5a47029ec69f839c0507fad0991c1f1a6ccbf3/Bio/PDB/Model.py that there is no "getter" function for this, but you can still access it directly with "model.serial_num".

ADD COMMENT

Login before adding your answer.

Traffic: 1074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6