I have a bunch of 10 or more very closely related DNA sequences (from different strains) aligned to a reference chromosome. How can I generate multiple sequence alignment along with "reference" of all these sequences without affecting individual alignments to the reference. Don't cut the blank sequence.
for example:
Input:
Original reference:
CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC
Pairwise alignments:
Ref1: CGACAAT--GCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC
Seq1: CGACAATAAGCACGACAGAGGAAGCAGAACAGATA-----ATTGCCTCTCATTTTC-CTCCC
Ref1: CGACAATGCACGACAGAGGAAGC--AGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCC
Seq2: CGACAAT-CACGACAGAGGAAGCTTAGAACAGATATTTAG---GCCTCTCATTTTCTCTCCC
Ref1: CGACAATGCACGACAGAGGAAG----CAGAACAGATATTTAGATTGCCTCTCA----TTTTCTCTCCC
Seq3: CGACAATGCACGACAGAGGAAGTTTTCAGAACAGATATTTAGATTGCCTCTCAAAAATTTTCTCTCCC
Output: Final Multiple sequence alignment:
Ref1: CGACAAT--GCACGACAGAGGAAG----C--AGAACAGATATTTAGATTGCCTCTCA----TTTTCTCTCCC
Seq1: CGACAATAAGCACGACAGAGGAAG----C--AGAACAGATA-----ATTGCCTCTCA----TTTTC-CTCCC
Seq2: CGACAAT---CACGACAGAGGAAG----CTTAGAACAGATATTTAG---GCCTCTCA----TTTTCTCTCCC
Seq3: CGACAAT--GCACGACAGAGGAAGTTTTC--AGAACAGATATTTAGATTGCCTCTCAAAAATTTTCTCTCCC
Someone has asked this question, But I can't run this script. http://www.perlmonks.org/bare/?node_id=866127
Thank you !!
I'm confused. you want to take the aligned query sequence from several pairwise alignments, and compare them all against one reference?
jrj.healey:
result
You may not get an optimal answer by default run of an MSA program in this type of a case. You may have to do some manual editing of the alignments to achieve the exact result you want (editing within reason).
HAving to do some editing (within reason) to a multiple sequence alignment is pretty normal. I wouldn't put that in the bucket of MSA not giving you optimal results. I spent my entire PhD doing highly divergent MSA and phylogenetic analysis, this is definitely a case that MSA is entirely appropriate for.
I meant to say that the answer from a default run of a MSA program may not be usable as is. I have amended the post above.