How To Download Atomic Coordinates For A Specific Chain From A Pdb File?
4
0
Entering edit mode
12.4 years ago
edweena ▴ 10

Hi,

What is the easiest way to download atomic coordinates for specific chains from multiple PDB files?

pdb • 8.0k views
ADD COMMENT
1
Entering edit mode
12.4 years ago
Pappu ★ 2.1k

The following should work fine. You can use Biopython for more flexibility.

for i in 3EML 2VTP 2VEZ 
do 
wget -c "http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId="$i -O $i.pdb
grep  ATOM $i.pdb | grep ' A  ' > $i\_A.pdb
done
ADD COMMENT
0
Entering edit mode

Thank you for your answer. Using your code, I got the atomic coordinates of A chain. If I need to get B chain from the first PDB file, C chain from the second one, how to change your code? sorry, I am new to perl. Could you give the name of the biopython module?

ADD REPLY
0
Entering edit mode

Replace A with B and so on. Read bash and grep manual.

ADD REPLY
1
Entering edit mode
10.7 years ago
Woa ★ 2.9k

It is better IMHO to download the entire PDB file, containing all the chain co-ordinates and then parse using some dedicated PDB parsers of Biopython, Bioperl or R (Bio-3d package) rather than any regex type parsers.

ADD COMMENT
0
Entering edit mode
10.7 years ago
Hamish ★ 3.3k

Assuming that you are just interested in the ATOM records then you will need to take into account the structure of the ATOM records. These are not structured as delimited fields, but are instead position based, thus the identification of the specific chain records must look at a specific position in the record to get the correct ATOM records.

Something like the following Perl script will do the job:

#!/usr/bin/env perl

# Enable Perl warnings
use strict;
use warnings;

# Load modules.
use File::Basename;
use Getopt::Long qw(:config no_ignore_case bundling);
use LWP;

### Default parameters.
# Base URL for dbfetch service used to fetch PDB entries.
my $dbfetchBaseUrl = 'http://www.ebi.ac.uk/Tools/dbfetch/dbfetch';
# PDB entry identifier.
my $pdbEntryIdListStr = '3E3Q,3E3Q,3EML';
# PDB chain identifier.
my $pdbChainIdListStr = 'A,a,A';

my $scriptName = basename( $0, () );
my $usage = <<EOF Extract="" ATOM="" records="" for="" a="" specified="" chain="" from="" a="" PDB="" entry.="" Usage:="" $scriptName="" -h="" $scriptName="" [-e="" pdbId]="" [-c="" chainId]="" -e,="" --entry="" PDB="" entry="" identifier="" list.="" [$pdbEntryIdListStr]="" -c,="" --chain="" PDB="" chain="" identifier="" list="" (case="" sensitive).="" [$pdbChainIdListStr]="" The="" list="" of="" entry="" Ids="" and="" chains="" Ids="" are="" assumed="" to="" be="" parallel,="" so="" to="" get="" the="" ATOM="" records="" for="" 3E3Q="" chain="" 'A',="" 3E3Q="" chain="" 'a'="" and="" 3EML="" chain="" 'A'="" the="" parameters="" would="" be:="" $scriptName="" -e="" 3E3Q,3E3Q,3EML="" -c="" A,a,A="" -h,="" --help="" Help="" usage="" message.="" EOF="" ;="" my="" $helpFlag;="" GetOptions(="" 'entry|e="s'" ==""> \$pdbEntryIdListStr,
    'chain|c=s' => \$pdbChainIdListStr,
    'help|h' => \$helpFlag,
    );

if($helpFlag) {
    print $usage;
    exit(0);
}

my @pdbEntryIdList = split(/[ ,]+/, $pdbEntryIdListStr);
my @pdbChainIdList = split(/[ ,]+/, $pdbChainIdListStr);
if(scalar(@pdbEntryIdList) != scalar(@pdbChainIdList)) {
    print STDERR @pdbEntryIdList;
    print STDERR @pdbChainIdList;
    die 'PDB entry Id list and PDB chain Id list lengths differ.';
}
for(my $i = 0; $i < scalar(@pdbChainIdList); $i++) {
    my $pdbEntryId = $pdbEntryIdList[$i];
    my $pdbChainId = $pdbChainIdList[$i];
    # Get the entry.
    my $pdbEntry = &fetchEntry('pdb', $pdbEntryId);
    if($pdbEntry) {
        # Extract 'ATOM' records.
        my @records = split(/\n+/, $pdbEntry);
        my @header_records = grep(/^HEADER/, @records);
        my @atom_records = grep(/^ATOM  .{14} $pdbChainId/, @records);
        print @header_records;
        print @atom_records;
    }
}

sub fetchEntry {
    my $dbName = shift;
    my $entryId = shift;
    # Create an LWP UserAgent for making HTTP calls.
    my $ua = LWP::UserAgent->new();
    # Configure HTTP proxy support from environment.
    $ua->env_proxy;

    my $requestUrl = $dbfetchBaseUrl . '/' . $dbName . '/' . $entryId . '?style=raw';
    print STDERR $requestUrl, "\n";
    my $response = $ua->get($requestUrl);
    if($response->is_error) {
        die 'http status: ' . $response->code . ' ' . $response->message;
    }
    my $content = $response->content();
    if($content =~ m/^ERROR/) {
        warn $content;
    }
    return $content;
}
ADD COMMENT
0
Entering edit mode

Can you put comments in that?

ADD REPLY
0
Entering edit mode
5.8 years ago
M.O.L.S ▴ 100

First, Specify which chain you want to look at.

Different PDB files may have different numbers of chains.

Use the chain methods to get the atoms and atomic coordinates.

If you are looking at multiple PDB files you have to find a way to wrap that together in a for loop.

ADD COMMENT

Login before adding your answer.

Traffic: 1854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6