I am trying to split a large number of pdb files using Biopython and then save them as separate files called pdbid_chain.pdb . So far I did not succeed. Additionally, I am quite new to python.
Any help is highly appreciated!
Here is my code:
#pdb_list contains a list of 208 pdb structures
#PDB_RAW_DIR is the directory where structures are stored
io = PDBIO()
#parse structures
for f in pdb_list:
pdb_parsed = PDBParser().get_structure(pdb_ids, str(PDB_RAW_DIR) + '/' + f)
#save chains
for structure in pdb_parsed:
pdb_chains = structure.get_chains()
for chain in pdb_chains:
io.set_structure(chain)
io.save(pdb_parsed.get_id() + "_" + chain.get_id() + ".pdb")
Cheers!
Alternatives:
You can view a static version of a notebook here that has a bash/sed version of splitting out the chains that is very short.
At the top of that notebook you can find a link to use and run an R-based version,
pdbsplit
, in the Bio3D package.You can run that notebooks actively by going here, clicking the
launch binder
badge, and then in the session that comes up choose the from the available notebooks, 'Split PDB files into chains using command line'.There is also a stand-alone program called
pdbsplitchains
in this library.Thanks a lot Mensur! But this does not really help as this is a solution for a single file and is basically what I already did. Meanwhile, I figure out that it is a fundamental problem with biopython (I am using v. 1.77), as described here: https://github.com/biopython/biopython/pull/3223 The fix that they are proposing is working.
Thanks anyways!