Many thanks for reading my question.
Recently, I want to align multiple sequences from different genomic regions with the whole genome sequences for subsequent phylogenetic study, but I found none of my known tools can fulfill my requirement. Can any fellows give me some suggestion. Thanks a lot.
For example, let's say that the whole genome consists of 3 regions, A,B,C. I have some complete genome sequences, and some sequences of A regions, some sequences of B regions, also some sequences of C regions. I want to get alignment on all those 4 kinds of sequences simultaneously, in one alignment operation (I do not want to align A,B,C to the whole genome sequences separately). And I have used many multiple alignment tools but found none can do this. So I wonder if anyone can help me solve this question.
Thanks!
I don't really get it. Is the problem that aligner takes one read and maps it? This is how it's done, alignment of each read is independent from other reads
Thanks for you reply. Actually I want to do a multiple sequence alignment of "reference + A + B + C" at the same time.
liaoyunshi : As stated this question is not clear. Are you referring to pair-wise alignments or multiple sequence alignments? Those two are different things. Sounds to me like you want to do a multiple sequence alignment of "reference + A + B + C" at the same time. Is that correct?
Yes, you are right. I want to do a MSA of all sequences at the same time.
Any MSA program should be able to do that. Is the reference very long compared to A/B/C?
Not very long, about 10-fold in length. But I think the problem is that the MSA program will try to align A/B/C in somewhat overlapping columns, while in fact they should be totally separated because they come from different regions of the genome.
If you know that they come from different regions of the genome then why do you want to align them at the same time?
Because I have a huge number of sequences from GenBank. I only know they should be from different regions while I don't exactly know which region each sequence belongs to. So it is hard to separate them at the beginning. Instead, I need to do a huge full alignment at first which can tell me the region of each sequence.