Question

Is There An Easy Way To Properly Orient Sequences For Clustalw Alignment?

5

Entering edit mode

12.9 years ago

David M ▴ 580

I have a number (~500) of 1-2kbp sequences which I'd like to align using clustalw; in this case the sequences are from transposable elements. The problem is that (obviously) the sequences have a distinct directionality, and the program I use to mine the sequences from a genomic contig set doesn't guarantee any particular orientation. My concern is that for some sequences I should be aligning the reverse complement, rather than the original orientation.

Is there a way in clustalw to get around this problem? Is there some other program that can put my sequences in the proper orientation before I align them with clustalw?

A quick example:

ATCGCGATATCG and CGATATCGCGAT can clearly be aligned, since the second sequence is the reverse complement of the first. If I ran clustalw with them as is, however, the alignment would be far from ideal.

clustalw alignment multiple • 12k views

ADD COMMENT • link updated 2.9 years ago by onestop_data ▴ 330 • written 12.9 years ago by David M ▴ 580

Ram · Answer 1 · 2012-01-07

4

Entering edit mode

12.9 years ago

Ahdf-Lell-Kocks ★ 1.6k

You can use PAGAN which with the --compare-reverse option will look at both directions:

./pagan --compare-reverse --readsfile sequences.fasta

ADD COMMENT • link updated 5.2 years ago by Ram 44k • written 12.9 years ago by Ahdf-Lell-Kocks ★ 1.6k

1

Entering edit mode

Hi David,

PAGAN is a new program and still actively developed. The feature you were using is very recent and indeed didn't support DNA ambiguity code. I've now pushed an updated version that fixes this issue. The latest version can be obtained with 'git'.

The new version should do the reverse-complement alignment with ambiguity characters and also supports translated alignment and translated alignment using the best ORF in the read sequences. Unfortunately I haven't got time to document all the new features. Please contact me if you find them interesting.

Regards, Ari

ADD REPLY • link 12.9 years ago by Ari ▴ 120

0

Entering edit mode

Does pagan allow for the presence of unknown ('N') characters? I'm getting an error that says: "Unexpected characters found. Reverse-complement failed".

ADD REPLY • link 12.9 years ago by David M ▴ 580

score 2 · Answer 2 · 2012-01-05

Perhaps you could build a small sequence database of known sequences and then align your queries sequences against the database to determine their orientation. You would need to choose an aligner with a suitable length for your queries but that shouldn't be hard to find. If the queries orientation is in the wrong direction you could then reverse complement it prior to the multiple sequence alignment.

score 1 · Answer 3 · 2012-01-07

I haven't used any aligner that would do this, although I've read that the Guidance server has a HeadOrTails HoT algorithm to work with reversed sequences. what I would do is surely to forget about ClustalW (it certainly has done his job, but there are now better aligners out there) and use MAFFT to align both your ~500 sequences set and another ~500 sequences reversing the previous ones. the higher alignment scores of one of the two sets would be the hint needed to focus on that particular set, and once knowing the sequence orientation then you can study the set deeper.

score 1 · Answer 4 · 2021-12-29

1

Entering edit mode

2.9 years ago

onestop_data ▴ 330

This blog post teaches an easy way on how to create a multiple sequence alignment (MSA) aware of forward and reverse complement directions.

https://onestopdataanalysis.com/multiple-sequence-alignment-msa-reverse-complement/

ADD COMMENT • link 2.9 years ago by onestop_data ▴ 330