Computing pairwise AA global sequence alignment between all pairs in a vector of sequences
0
0
Entering edit mode
15 months ago
rubic ▴ 270

Hi,

I have a long vector of AA sequences (~10,000) and I need to compute the score of the global sequence alignment between all possible pairs. This means ~10 million pairs, which shrink down to ~5 million because of redundancy.

My question is if there's a tool/package (preferably R but python will also work) that does that efficiently?

pairwiseAlignment • 940 views
ADD COMMENT
1
Entering edit mode

What have you tried so far? Was pairwiseAlignment (Biostrings) too slow?

ADD REPLY
0
Entering edit mode

R's Biostrings's pairwiseAlignment is what I've tried so far but is impractical for the scale of operations I'm looking at: ~5M pairwise alignments.

ADD REPLY
1
Entering edit mode

OK, in this case, I think you have two options:
1) Parallelization - if you have access to a multi-core machine, or computer cluster, then you can divide the work into multiple processes or jobs. The exact way to do this will depend on the infrastructure on which you work.
2) Give up on some of the alignments. The question is - do you actually need all pairwise alignments? What if you just run a "all vs. all" blast, so each sequence finds its top matches? You didn't say what you're trying to do exactly, so I don't know if this is an option.

ADD REPLY
0
Entering edit mode

Parallelization is the answer, along with hashing. There are a couple of python packages that might be suitable for this (e.g., scirpy and tcrdist), but I need to see how to operate them for my purpose.

ADD REPLY

Login before adding your answer.

Traffic: 2388 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6