Question

Fasta Alignment Format - Sequences Must Be Same Length?

3

Entering edit mode

14.5 years ago

User 9996 ▴ 840

If I have a FASTA file that gives aligned protein sequences for genes, e.g. a sequence for paralog A and a sequence for paralog B, is there a requirement that the aligned sequences be the same length? Do most alignment programs yield same length sequences at the end?

thanks.

fasta alignment protein programming • 13k views

ADD COMMENT • link updated 14.0 years ago by Daniel Standage 4.1k • written 14.5 years ago by User 9996 ▴ 840

score 5 · Answer 1 · 2011-02-02

5

Entering edit mode

14.5 years ago

Peter ▴ 90

Yes, alignment programs usually output alignments with equal-length sequences. Of course, if you align two sequences with different lengths, gap characters will be introduced (global alignment)

ADD COMMENT • link 14.5 years ago by Peter ▴ 90

1

Entering edit mode

I agree with Peter and Daniel.

It can be a little bit confusing when you look at your multi fasta file because the gaps may be represented two symbols : - at the end and beginning of sequences they are represented by spaces - inside the sequences they are represented by '-'

I prefer to use the same symbol for all gaps.

ADD REPLY • link 14.5 years ago by Bilouweb ★ 1.1k

score 5 · Answer 2 · 2011-02-02

No matter what the length of A and B are, A' and B' (the aligned sequences) will be of the same length. This simply comes from the definition of an alignment. The characters (representing base pairs) from the two sequences are arranged as to minimize the differences between them, and then the empty spaces (if any) are filled in with gaps (dash characters). These gaps are typically interpreted as evolutionary events between two homologous sequences, i.e. an insertion of nucleotides to one sequence or a deletion of nucleotides from the other (indels).