Question

How to align multiple genomic regions simultaneously

0

Entering edit mode

7.3 years ago

liaoyunshi • 0

Many thanks for reading my question.

Recently, I want to align multiple sequences from different genomic regions with the whole genome sequences for subsequent phylogenetic study, but I found none of my known tools can fulfill my requirement. Can any fellows give me some suggestion. Thanks a lot.

For example, let's say that the whole genome consists of 3 regions, A,B,C. I have some complete genome sequences, and some sequences of A regions, some sequences of B regions, also some sequences of C regions. I want to get alignment on all those 4 kinds of sequences simultaneously, in one alignment operation (I do not want to align A,B,C to the whole genome sequences separately). And I have used many multiple alignment tools but found none can do this. So I wonder if anyone can help me solve this question.

Thanks!

alignment phylogenetics • 2.3k views

ADD COMMENT • link 7.3 years ago by liaoyunshi • 0

0

Entering edit mode

I don't really get it. Is the problem that aligner takes one read and maps it? This is how it's done, alignment of each read is independent from other reads

ADD REPLY • link 7.3 years ago by stolarek.ir ▴ 700

0

Entering edit mode

Thanks for you reply. Actually I want to do a multiple sequence alignment of "reference + A + B + C" at the same time.

ADD REPLY • link 7.3 years ago by liaoyunshi • 0

0

Entering edit mode

liaoyunshi : As stated this question is not clear. Are you referring to pair-wise alignments or multiple sequence alignments? Those two are different things. Sounds to me like you want to do a multiple sequence alignment of "reference + A + B + C" at the same time. Is that correct?

ADD REPLY • link 7.3 years ago by GenoMax 147k

0

Entering edit mode

Yes, you are right. I want to do a MSA of all sequences at the same time.

ADD REPLY • link 7.3 years ago by liaoyunshi • 0

0

Entering edit mode

Any MSA program should be able to do that. Is the reference very long compared to A/B/C?

ADD REPLY • link 7.3 years ago by GenoMax 147k

0

Entering edit mode

Not very long, about 10-fold in length. But I think the problem is that the MSA program will try to align A/B/C in somewhat overlapping columns, while in fact they should be totally separated because they come from different regions of the genome.

ADD REPLY • link 7.3 years ago by liaoyunshi • 0

0

Entering edit mode

If you know that they come from different regions of the genome then why do you want to align them at the same time?

ADD REPLY • link 7.3 years ago by GenoMax 147k

0

Entering edit mode

Because I have a huge number of sequences from GenBank. I only know they should be from different regions while I don't exactly know which region each sequence belongs to. So it is hard to separate them at the beginning. Instead, I need to do a huge full alignment at first which can tell me the region of each sequence.

ADD REPLY • link 7.3 years ago by liaoyunshi • 0