Question

Finding The Position Of Amino Acid

5

Entering edit mode

14.4 years ago

William Angliss ▴ 50

I'm trying to find out the number & position of an amino acid, Lysine, in Trastuzumab. Does anyone have an idea of the appropriate software that I should use to determine this ?

Much appreciated.

amino-acids sequence position • 9.8k views

ADD COMMENT • link updated 14.3 years ago by Jake ▴ 150 • written 14.4 years ago by William Angliss ▴ 50

Ram · Answer 1 · 2010-12-15

Just to expand on the answer by Bio_X2Y: this is a "classic" type of bioinformatics task, for which there is unlikely to be an online application or a ready-made software solution. A roll-your-own solution is almost expected; it's considered trivial yet nobody provides the software :-)

Here's one that uses the Bio::SeqIO module from Bioperl. Assuming that you have saved the light chain sequence in fasta format to the file lc.fa:

#!/usr/bin/perl -w

use strict;
use Bio::SeqIO;
my $inseq = Bio::SeqIO->new(-file => "lc.fa", -format => "fasta");

while(my $seq = $inseq->next_seq) {
  my @aa = split("", $seq->seq);
  my(@index) = grep { $aa[$_] =~ /K/i } 0..$#aa;
     @index  = map {$_ + 1} @index;  # convert 0-based array to 1-based sequence
  print "Lys at ", join(", ", @index), "\n";
}

Result:

Lys at 39, 42, 45, 103, 107, 126, 145, 149, 169, 183, 188, 190, 207

score 7 · Answer 2 · 2010-12-15

This is not my area, so don't treat this as authoritative.

Trastuzumab is an IgG-kappa monoclonal antibody. An antibody is made up of two identical heavy chains (in this case a gamma) and two identical light chains (in this case a kappa). So I imagine your question can be broken down into:

Where can I get the protein sequences for the trastuzumab heavy and light chains?
How can I find lysines within these chains?

I'm not familiar with where most people get official drug sequences, but the wikipedia page for trastuzumab provides a link to a DrugBank card, which provides sequences for both chains (plus some other variant formats that I don't understand).

There is software out there that can be used to find a particular amino acid in a given sequence (e.g. standalone BLAST), but I'm not familiar with anything that can be installed and run quickly. Since you are only dealing with two small sequences (heavy chain 451aa, light chain 214aa), I suggest you write a simple Perl script to find the lysine (K) residue positions (it should only require 5-10 lines). If you're not comfortable with scripting, or setting up software like BLAST, I suggest you manually identify the K's - hopefully it won't take more than a few minutes!

score 5 · Answer 3 · 2010-12-15

5

Entering edit mode

14.4 years ago

Julien ▴ 160

The EMBOSS program, either standalone or web-based (eg. http://pro.genomics.purdue.edu/emboss/) has a fuzzpro program that takes a sequence and a pattern, K in this case, and gives the position on the sequence.

ADD COMMENT • link 14.4 years ago by Julien ▴ 160

0

Entering edit mode

Nice find. I forgot to look at EMBOSS; it has a tool for most occasions (and as you mention, web interfaces for non-coders).

ADD REPLY • link 14.4 years ago by Neilfws 49k

score 1 · Answer 4 · 2010-12-15

1

Entering edit mode

14.4 years ago

Jake ▴ 150

For Antibody sequences you can use Abysis http://www.bioinf.org.uk/abysis/tools/analyze.cgi which will number your sequences using the standard kabat or chothia numbering scheme.

ADD COMMENT • link 14.4 years ago by Jake ▴ 150