Hi,
What is the easiest way to download atomic coordinates for specific chains from multiple PDB files?
Hi,
What is the easiest way to download atomic coordinates for specific chains from multiple PDB files?
The following should work fine. You can use Biopython for more flexibility.
for i in 3EML 2VTP 2VEZ
do
wget -c "http://www.pdb.org/pdb/download/downloadFile.do?fileFormat=pdb&compression=NO&structureId="$i -O $i.pdb
grep ATOM $i.pdb | grep ' A ' > $i\_A.pdb
done
It is better IMHO to download the entire PDB file, containing all the chain co-ordinates and then parse using some dedicated PDB parsers of Biopython, Bioperl or R (Bio-3d package) rather than any regex type parsers.
Assuming that you are just interested in the ATOM records then you will need to take into account the structure of the ATOM records. These are not structured as delimited fields, but are instead position based, thus the identification of the specific chain records must look at a specific position in the record to get the correct ATOM records.
Something like the following Perl script will do the job:
#!/usr/bin/env perl
# Enable Perl warnings
use strict;
use warnings;
# Load modules.
use File::Basename;
use Getopt::Long qw(:config no_ignore_case bundling);
use LWP;
### Default parameters.
# Base URL for dbfetch service used to fetch PDB entries.
my $dbfetchBaseUrl = 'http://www.ebi.ac.uk/Tools/dbfetch/dbfetch';
# PDB entry identifier.
my $pdbEntryIdListStr = '3E3Q,3E3Q,3EML';
# PDB chain identifier.
my $pdbChainIdListStr = 'A,a,A';
my $scriptName = basename( $0, () );
my $usage = <<EOF Extract="" ATOM="" records="" for="" a="" specified="" chain="" from="" a="" PDB="" entry.="" Usage:="" $scriptName="" -h="" $scriptName="" [-e="" pdbId]="" [-c="" chainId]="" -e,="" --entry="" PDB="" entry="" identifier="" list.="" [$pdbEntryIdListStr]="" -c,="" --chain="" PDB="" chain="" identifier="" list="" (case="" sensitive).="" [$pdbChainIdListStr]="" The="" list="" of="" entry="" Ids="" and="" chains="" Ids="" are="" assumed="" to="" be="" parallel,="" so="" to="" get="" the="" ATOM="" records="" for="" 3E3Q="" chain="" 'A',="" 3E3Q="" chain="" 'a'="" and="" 3EML="" chain="" 'A'="" the="" parameters="" would="" be:="" $scriptName="" -e="" 3E3Q,3E3Q,3EML="" -c="" A,a,A="" -h,="" --help="" Help="" usage="" message.="" EOF="" ;="" my="" $helpFlag;="" GetOptions(="" 'entry|e="s'" ==""> \$pdbEntryIdListStr,
'chain|c=s' => \$pdbChainIdListStr,
'help|h' => \$helpFlag,
);
if($helpFlag) {
print $usage;
exit(0);
}
my @pdbEntryIdList = split(/[ ,]+/, $pdbEntryIdListStr);
my @pdbChainIdList = split(/[ ,]+/, $pdbChainIdListStr);
if(scalar(@pdbEntryIdList) != scalar(@pdbChainIdList)) {
print STDERR @pdbEntryIdList;
print STDERR @pdbChainIdList;
die 'PDB entry Id list and PDB chain Id list lengths differ.';
}
for(my $i = 0; $i < scalar(@pdbChainIdList); $i++) {
my $pdbEntryId = $pdbEntryIdList[$i];
my $pdbChainId = $pdbChainIdList[$i];
# Get the entry.
my $pdbEntry = &fetchEntry('pdb', $pdbEntryId);
if($pdbEntry) {
# Extract 'ATOM' records.
my @records = split(/\n+/, $pdbEntry);
my @header_records = grep(/^HEADER/, @records);
my @atom_records = grep(/^ATOM .{14} $pdbChainId/, @records);
print @header_records;
print @atom_records;
}
}
sub fetchEntry {
my $dbName = shift;
my $entryId = shift;
# Create an LWP UserAgent for making HTTP calls.
my $ua = LWP::UserAgent->new();
# Configure HTTP proxy support from environment.
$ua->env_proxy;
my $requestUrl = $dbfetchBaseUrl . '/' . $dbName . '/' . $entryId . '?style=raw';
print STDERR $requestUrl, "\n";
my $response = $ua->get($requestUrl);
if($response->is_error) {
die 'http status: ' . $response->code . ' ' . $response->message;
}
my $content = $response->content();
if($content =~ m/^ERROR/) {
warn $content;
}
return $content;
}
First, Specify which chain you want to look at.
Different PDB files may have different numbers of chains.
Use the chain methods to get the atoms and atomic coordinates.
If you are looking at multiple PDB files you have to find a way to wrap that together in a for loop.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you for your answer. Using your code, I got the atomic coordinates of A chain. If I need to get B chain from the first PDB file, C chain from the second one, how to change your code? sorry, I am new to perl. Could you give the name of the biopython module?
Replace A with B and so on. Read bash and grep manual.