Question

Protein Sequence Alignment

0

Entering edit mode

9.8 years ago

pwg46 ▴ 540

Say I am given a protein U1 from the Uniprot database. And, according to UniProt's mapping data file, U1 maps to R1 in RefSeq's protein database. While U1's and R1's sequences are very similar, len(R1)>len(U1), I am guessing because R1 contains some extra region. What is an efficient way to align these two proteins? That is, I want to make len(U1)==len(R1), and the chunk that U1 is missing should be filled in with some empty symbol, e.g "-". Would I have to use some recursive segmentation algorithm?

uniprot refseq protein sequence alignment • 1.9k views

ADD COMMENT • link updated 9.8 years ago by dago ★ 2.8k • written 9.8 years ago by pwg46 ▴ 540

score 0 · Answer 1 · 2015-02-11

0

Entering edit mode

9.8 years ago

Jean-Karim Heriche 27k

For global alignment of two sequences, you're looking for the Needleman-Wunsch algorithm.

ADD COMMENT • link 9.8 years ago by Jean-Karim Heriche 27k

score 0 · Answer 2 · 2015-02-12

0

Entering edit mode

9.8 years ago

dago ★ 2.8k

I think you have to perform a global alignment as @Jean-Karim Heriche said. There are many tools that are able to create and end to end alignment, for example here. With this pairwise alignment you should be able to precisly see the reagion that you want to remove in R1.

ADD COMMENT • link 9.8 years ago by dago ★ 2.8k