Fasta Alignment Format - Sequences Must Be Same Length?
2
3
Entering edit mode
13.8 years ago
User 9996 ▴ 840

If I have a FASTA file that gives aligned protein sequences for genes, e.g. a sequence for paralog A and a sequence for paralog B, is there a requirement that the aligned sequences be the same length? Do most alignment programs yield same length sequences at the end?

thanks.

fasta alignment protein programming • 12k views
ADD COMMENT
5
Entering edit mode
13.8 years ago
Peter ▴ 90

Yes, alignment programs usually output alignments with equal-length sequences. Of course, if you align two sequences with different lengths, gap characters will be introduced (global alignment)

ADD COMMENT
1
Entering edit mode

I agree with Peter and Daniel.

It can be a little bit confusing when you look at your multi fasta file because the gaps may be represented two symbols : - at the end and beginning of sequences they are represented by spaces - inside the sequences they are represented by '-'

I prefer to use the same symbol for all gaps.

ADD REPLY
5
Entering edit mode
13.8 years ago

No matter what the length of A and B are, A' and B' (the aligned sequences) will be of the same length. This simply comes from the definition of an alignment. The characters (representing base pairs) from the two sequences are arranged as to minimize the differences between them, and then the empty spaces (if any) are filled in with gaps (dash characters). These gaps are typically interpreted as evolutionary events between two homologous sequences, i.e. an insertion of nucleotides to one sequence or a deletion of nucleotides from the other (indels).

ADD COMMENT

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6