Randomizing input order of sequences gives completely different alignments. Is there a way to address this problem ?
More info: I am using MUSCLE algorithm for performing MSA.
Randomizing input order of sequences gives completely different alignments. Is there a way to address this problem ?
More info: I am using MUSCLE algorithm for performing MSA.
Bob Edgar (Author of Muscle) wrote this blog entry on big alignments:
Consider using Uclust.
If that was true, than all Pfam full alignments would be nonsense. Of interest here is also a new paper from Chris Sanders's group (http://www.ncbi.nlm.nih.gov/pubmed/22163331), where the authors used "big" protein alignments to accurately predict folds using statistical methods, which is not possible with "small" alignments.
I found the following code that shuffles the order of sequences in fasta format. The "perl script randomly shuffles the order of sequences in a fasta file. Upon execution, specify your input file (without .fasta extension) and total no. of sequences." Feed that output into MUSCLE.
Exactly what the code to which the link in my response will do. That code shuffles not the sequence, but the sequence order. So, sequences, 1,2,3,4,5 will become 4,2,3,5,1, for example. Now, with this randomized ordering of the input sequences, you can test for the "input sequence" bias.
Sorry, If my query is not clear. I am looking for ways to remove the input order bias. In other words, if I change input order, I am obtaining completely different alignment. I want to know if there is any way in which no-matter-what-the-input-order-is I will always obtain similar alignment (if not identical)
I think this "problem" arises because at some stage an asymmetric pairwise distance measure is computed, i.e. the result depends on ordering. However, I'm not sure where exactly this happens in Muscle. The first distance used there (K-mer distance) should be symmetric. Does a -maxiter 1 always give the same result? A manual way to get rid of this would be to sort the sequences first according to some criterium (e.g. length) but there's of course no guarantee that this would give better alignments.
Andreas
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
How many sequences are you aligning with MUSCLE?
in the range of 15000 - 30,000
usually in the range of 15K-30K
usually in the range of 15K-30K (funny, but true)
Do you have similar issues in other alignment programs? Have you tried MAAFT?
Do you have similar issues in other alignment programs? Have you tried MAAFT? http://mafft.cbrc.jp/alignment/software/