I'm trying to find out the number & position of an amino acid, Lysine, in Trastuzumab. Does anyone have an idea of the appropriate software that I should use to determine this ?
Much appreciated.
I'm trying to find out the number & position of an amino acid, Lysine, in Trastuzumab. Does anyone have an idea of the appropriate software that I should use to determine this ?
Much appreciated.
Just to expand on the answer by Bio_X2Y: this is a "classic" type of bioinformatics task, for which there is unlikely to be an online application or a ready-made software solution. A roll-your-own solution is almost expected; it's considered trivial yet nobody provides the software :-)
Here's one that uses the Bio::SeqIO module from Bioperl. Assuming that you have saved the light chain sequence in fasta format to the file lc.fa
:
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
my $inseq = Bio::SeqIO->new(-file => "lc.fa", -format => "fasta");
while(my $seq = $inseq->next_seq) {
my @aa = split("", $seq->seq);
my(@index) = grep { $aa[$_] =~ /K/i } 0..$#aa;
@index = map {$_ + 1} @index; # convert 0-based array to 1-based sequence
print "Lys at ", join(", ", @index), "\n";
}
Result:
Lys at 39, 42, 45, 103, 107, 126, 145, 149, 169, 183, 188, 190, 207
This is not my area, so don't treat this as authoritative.
Trastuzumab is an IgG-kappa monoclonal antibody. An antibody is made up of two identical heavy chains (in this case a gamma) and two identical light chains (in this case a kappa). So I imagine your question can be broken down into:
I'm not familiar with where most people get official drug sequences, but the wikipedia page for trastuzumab provides a link to a DrugBank card, which provides sequences for both chains (plus some other variant formats that I don't understand).
There is software out there that can be used to find a particular amino acid in a given sequence (e.g. standalone BLAST), but I'm not familiar with anything that can be installed and run quickly. Since you are only dealing with two small sequences (heavy chain 451aa, light chain 214aa), I suggest you write a simple Perl script to find the lysine (K) residue positions (it should only require 5-10 lines). If you're not comfortable with scripting, or setting up software like BLAST, I suggest you manually identify the K's - hopefully it won't take more than a few minutes!
The EMBOSS program, either standalone or web-based (eg. http://pro.genomics.purdue.edu/emboss/) has a fuzzpro program that takes a sequence and a pattern, K in this case, and gives the position on the sequence.
For Antibody sequences you can use Abysis http://www.bioinf.org.uk/abysis/tools/analyze.cgi which will number your sequences using the standard kabat or chothia numbering scheme.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Excellent answer.