All vs. all pairwise sequence aligment
4
1
Entering edit mode
7.8 years ago
mschmid ▴ 180

I have a fasta file with amino acid sequences. Now I want to align every sequence with every other (so it's pairwise). The outcome should be a distance (or similarity, doesn't matter) matrix.

I would like to do local alignment. EDIT: The sequences do NOT have the same length and might just have similar regions partially.

What tool would you suggest?

Ideally I can later do the same with nucleotide sequences.

alignment local fasta • 5.2k views
ADD COMMENT
1
Entering edit mode
7.8 years ago

Are you really sure you need a pairwise alignment and cannot do a global alignment? It is 1000 times faster and easier.

Use R! https://cran.r-project.org/web/packages/phangorn/phangorn.pdf Page 20

library(ips)
library(phangorn)
setwd("your_working_dir")

x <- read.fas("my_aligned_fasta.fasta")
x <- as.phyDat(x)
y <- dist.ml(x, whatever you need)
ADD COMMENT
0
Entering edit mode

This solution expects all seqs to be the same length, right?

ADD REPLY
0
Entering edit mode

Yes. To compare distances they must be aligned.

ADD REPLY
0
Entering edit mode
7.8 years ago
fhsantanna ▴ 620

ClustalO can output a distance matrix (--distmat-out=<file> option).

ADD COMMENT
0
Entering edit mode

ClustalO does multiple sequence alignment, not pairwise, right?

ADD REPLY
0
Entering edit mode
7.8 years ago
buchfink ▴ 250

You can try the DIAMOND aligner: https://github.com/bbuchfink/diamond

ADD COMMENT
0
Entering edit mode
7.8 years ago
kloetzl ★ 1.1k

If all you want is a distance matrix, why bother aligning at all? Try http://spaced.gobics.de/ for amino acids.

ADD COMMENT

Login before adding your answer.

Traffic: 1923 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6