Question

Fastest pairwise alignment for 10,000 sequences

0

Entering edit mode

8.9 years ago

sheinsch ▴ 10

I need to find pairwise alignment scores for 10,000 amino acid sequences that range from 200 aa to 4000 aa. I am currently using the EMBOSS wrapper within python to do the comparisons. However, judging by the rate at which the alignments are being performed it will be quite a long time (roughly 2,000 days) before the whole batch is complete. This seems very high and I am guessing there is a better way to accomplish what I am setting out to do.

What I have tried already:

I have excluded any comparisons that cannot generate an identity higher than 50% based on length.

alignment • 3.4k views

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 8.9 years ago by sheinsch ▴ 10

3

Entering edit mode

This is why they invented the BLAST algorithm.

ADD REPLY • link 8.9 years ago by Benn 8.4k

Ram · Answer 1 · 2016-02-15

2

Entering edit mode

8.9 years ago

abascalfederico ★ 1.2k

For local alignments I would use BLAST. It will last hours, not days.

If you need to work with global alignments and all the sequences are homologous and have the same domains, you could make a multiple sequence alignment with mafft and calculate % of identities from it.

ADD COMMENT • link updated 2.5 years ago by Ram 44k • written 8.9 years ago by abascalfederico ★ 1.2k

0

Entering edit mode

Thanks I will give that a shot.

ADD REPLY • link 8.9 years ago by sheinsch ▴ 10