Entering edit mode
7.0 years ago
skjobs1234
▴
40
I would like to extract one letter amino acid code from the PDB coordinate file.
I would like to extract one letter amino acid code from the PDB coordinate file.
This can get you started, you can then modify it to suit your needs. Save it as get_pdb_aa.pl
, and run with get_pdb_aa.pl < file.pdb > out.txt
. This will output MMMMMLLLLLLLLKKKKKKKKKKKKKKK
for the example you provided.
#!/usr/bin/env perl
use warnings;
use strict;
my %aa_table = (
ala => 'A',arg => 'R',asn => 'N',asp => 'D',
asx => 'B',cys => 'C',glu => 'E',gln => 'Q',
glx => 'Z',gly => 'G',his => 'H',ile => 'I',
leu => 'L',lys => 'K',met => 'M',phe => 'F',
pro => 'P',ser => 'S',thr => 'T',trp => 'W',
tyr => 'Y',val => 'V',
);
foreach my $line ( <STDIN> ) {
my ($v1, $aa, $v3) = unpack 'A17A3A60', $line;
print "$aa_table{ lc($aa) }";
}
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It would be helpful to paste an example PDB co-ordinate file. I'm not a Perl programmer, so, I won't provide an answer anyway, but it will help the other Perl programmers here.
And the expected output is? I guess for this example is
MLKKK
, is that right? Or do you wantMMMMMLLLLLLL...
?Why Perl? Usually, when we're addressing a bioinformatics problem, we look at the best tool for the job, not how to do something limiting ourselves to just one specific tool.
I found one more way.
See this site below:
http://www.ebi.ac.uk/pdbe-srv/PDBeXplore/sequence/
Insert a structure ID and/or the author surname:
http://www.ebi.ac.uk/pdbe/entry/pdb/1aa0/
Fetch sequence in the left bottom corner of the page:
VSGLNNAVQNLQVEIGNNSAGIKGQVVALNTLVNGTNPNGSTVEERGLTNSIKANETNIASVTQEVNTAKGNISSLQGDVQALQEAGYIPEAPRDGQAYVRKDGEWVLLSTFL
OR
Quick links (right upper corner)
• 1aa0 overview
http://www.ebi.ac.uk/pdbe/entry/pdb/1aa0/protein/1
Macromolecules:
There are several ways to find a sequence.