Question

Get real model numbers from PDB using BioPython

1

Entering edit mode

5.4 years ago

NPalopoli ▴ 290

Hi,

The following piece of code uses Biopython to split all models from a valid PDB file into individual files.

It takes a PDB-formatted filename and a PDB code as input (arguments -f and -p respectively) and writes each existing model to a different output PDB file (with filename format "PDBcode"-"model".pdb).

My concern is that BioPython always numbers models starting from a dummy 0, so it is not directly possible to use the real model numbers in the name of the splitted files. (We can always add 1 to these numbers, but what if the PDB does not start from model 1, or model numbers are not consecutive?).

Is there any BioPython-compatible way of getting the actual model numbers from a PDB file and use these as part of the filenames for the spllited individual models?

Thanks!

#!/usr/bin/python
import sys, getopt
from Bio.PDB import PDBParser, PDBIO

def main(argv):
  # Parse input arguments
  pdbfile = ''
  pdbid = ''
  try:
    opts, args = getopt.getopt(argv,"hf:p:",["pdbfile=","pdbid="])
  except getopt.GetoptError:
    print 'split_pdb_models.py -f <pdbfile> -p <pdbid>'
    sys.exit(2)

  # Process input arguments
  for opt, arg in opts:
    if opt == '-h':
      print 'split_pdb_models.py -f <pdbfile> -p <pdbid>'
      sys.exit()
    elif opt in ("-f", "--pdbfile"):
      pdbfile = arg
    elif opt in ("-p", "--pdbid"):
      pdbid = arg

  # Split models from PDB into individual files
  pdb = PDBParser(QUIET=True)
  structure = pdb.get_structure(pdbid,pdbfile)
  for model in structure:
    io.set_structure(model)
    io.save(structure.get_id() + "-" + str(model.get_id()) + ".pdb")
  print 'Input file is:', pdbfile

if __name__ == "__main__":
  io = PDBIO()
  main(sys.argv[1:])

structure biopython proteins python2 • 1.9k views

ADD COMMENT • link updated 5.4 years ago by jgreener ▴ 390 • written 5.4 years ago by NPalopoli ▴ 290

score 2 · Accepted Answer · 2020-03-10

Yes, you can use the property "model.serial_num" to get the model number that was actually in the file (with no off-by-one shift), though I'm not sure this is documented anywhere.

See https://github.com/biopython/biopython/blob/master/Bio/PDB/PDBParser.py#L326-L334 and https://github.com/biopython/biopython/blob/20cb4d67adec136dcbe9bf01b29d56d39cdaa513/Bio/PDB/StructureBuilder.py#L69-L78 for where this happens in the code. It can also be seen in https://github.com/biopython/biopython/blob/1a5a47029ec69f839c0507fad0991c1f1a6ccbf3/Bio/PDB/Model.py that there is no "getter" function for this, but you can still access it directly with "model.serial_num".