Question

Parsing Result File

0

Entering edit mode

13.9 years ago

Samimirza ▴ 60

Hi,

I am trying to match PDB records with results from dimplot.

I have managed to create a dictionary : pdb[chain][residue][residue_no]

Now I have some data in folders for each PDB, I am interested in parsing a particular file whose contents are shown below: for eg

pdb1aut: chains L and C


                                             Hydrogen bonds  Non-bonded
               Interacting residues          M-M   S-S   M-S  contacts
          ------------------------------     ---   ---   ---    -----  
Residues: [HIS  107  L] -> [PHE  208  C]      0     0     0        2

Residues: [PRO  144  L] -> [TRP  207  C]      0     0     0        1

Residues: [GLY  142  L] -> [TRP  207  C]      1     0     0        3

Now I want to match the residue details from dic created above and make: pdb[chain][residue][residue_no][M-M][S-S][M-S][non-bonded contacts]

i also need to count the number of such interactions for eg: TRP 207 C comes in line 2 and 3 both, so I need to add number of M-M interactions etc.

I still havenot written a code and I cant do anything complicated since I am a newbie in python.

python parsing pdb • 3.0k views

ADD COMMENT • link updated 13.9 years ago by Lee Katz ★ 3.2k • written 13.9 years ago by Samimirza ▴ 60

2

Entering edit mode

One thing that would be nice is to show us your code so far. Making a dictionary that is nested that far is also a bad idea. It may be a better idea to build a class to hold the information for each residue.

ADD REPLY • link 13.9 years ago by Gww ★ 2.7k

Ram · Answer 1 · 2010-12-28

GWW makes an excellent point about your code design. For learning purposes, it's best to post what code you have written so answers can build off of that. To get started, this implementation parses the file and organizes by top level chain and residues. The counts are collapsed with addition for identical residues, so the output for your example looks like:

% python2.6 parse_dimplot.py dimplot-out.txt
{('C', '207', 'TRP'): {'M-M': 1, 'M-S': 0, 'Non-bonded contacts': 4, 'S-S': 0},
 ('C', '208', 'PHE'): {'M-M': 0, 'M-S': 0, 'Non-bonded contacts': 2, 'S-S': 0},
 ('L', '107', 'HIS'): {'M-M': 0, 'M-S': 0, 'Non-bonded contacts': 2, 'S-S': 0},
 ('L', '142', 'GLY'): {'M-M': 1, 'M-S': 0, 'Non-bonded contacts': 3, 'S-S': 0},
 ('L', '144', 'PRO'): {'M-M': 0, 'M-S': 0, 'Non-bonded contacts': 1, 'S-S': 0}}

And the code:

import sys
import pprint

def _parse_chain_parts(c):
    return c.replace("[", "").replace(" -> ", "").split()

def _parse_stats(s):
    m_m, s_s, m_s, non_bond = s.rstrip().split()
    return {"M-M": int(m_m), "S-S": int(s_s), "M-S": int(m_s),
            "Non-bonded contacts": int(non_bond)}

def parse_dimplot_line(line):
    line = line.replace("Residues: ", "")
    chain1, chain2, stats_str = line.split("]")
    stats = _parse_stats(stats_str)
    for chain_str in [chain1, chain2]:
        residue, r_num, chain = _parse_chain_parts(chain_str)
        yield (chain, r_num, residue, stats)

def parse_dimplot(in_file):
    with open(in_file) as in_handle:
        for line in in_handle:
            if line.startswith("Residues:"):
                for chain in parse_dimplot_line(line):
                    yield chain

def combine_stats(one, two):
    combined = dict()
    for k, val in one.items():
        combined[k] = val + two[k]
    return combined

in_file = sys.argv[1]
organized = dict()
for chain, r_num, residue, stats in parse_dimplot(in_file):
    r_key = (chain, r_num, residue)
    if organized.has_key(r_key):
        organized[r_key] = combine_stats(stats, organized[r_key])
    else:
        organized[r_key] = stats
pprint.pprint(organized)

Ram · Answer 2 · 2010-12-31

You might decide to just use a standard library like BioPerl. If using Perl, you can create a mapping hash however you want and then proceed with your analysis using the code samples below.

my %map=(atom1=>[0,0,0,2], ...);

http://www.bioperl.org/Core/Latest/bptutorial.html#iii_9_1_using_3d_structure_objects_and_reading_pdb_files__structurei__structure__io_

  use Bio::Structure::IO;
  use strict;
  my $structio = Bio::Structure::IO->new(-file => $file);
  my $struc = $structio->next_structure;
  for my $chain ($struc->get_chains) {
     my $chainid = $chain->id;
     # one-letter chaincode if present, 'default' otherwise
     for my $res ($struc->get_residues($chain)) {
        my $resid = $res->id;
        # format is 3-lettercode - dash - residue number, e.g. PHE-20
        my $atoms = $struc->get_atoms($res);
        # actually a list of atom objects, used here to get a count
        print join "\t", $chainid,$resid,$atoms,"\n";
     }
  }

http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Structure/IO/pdb.html

$stream = Bio::Structure::IO->new(-file => $filename,
                                  -format => 'PDB');

while (my $structure = $stream->next_structure) {
        # do something with $structure
}