I have a number (~500) of 1-2kbp sequences which I'd like to align using clustalw; in this case the sequences are from transposable elements. The problem is that (obviously) the sequences have a distinct directionality, and the program I use to mine the sequences from a genomic contig set doesn't guarantee any particular orientation. My concern is that for some sequences I should be aligning the reverse complement, rather than the original orientation.
Is there a way in clustalw to get around this problem? Is there some other program that can put my sequences in the proper orientation before I align them with clustalw?
A quick example:
ATCGCGATATCG and CGATATCGCGAT can clearly be aligned, since the second sequence is the reverse complement of the first. If I ran clustalw with them as is, however, the alignment would be far from ideal.
Hi David,
PAGAN is a new program and still actively developed. The feature you were using is very recent and indeed didn't support DNA ambiguity code. I've now pushed an updated version that fixes this issue. The latest version can be obtained with 'git'.
The new version should do the reverse-complement alignment with ambiguity characters and also supports translated alignment and translated alignment using the best ORF in the read sequences. Unfortunately I haven't got time to document all the new features. Please contact me if you find them interesting.
Regards, Ari
Does pagan allow for the presence of unknown ('N') characters? I'm getting an error that says: "Unexpected characters found. Reverse-complement failed".