Entering edit mode
11.1 years ago
SRKR
▴
180
I have a set of aligned sequences in fasta format. I want to get consensus out of the alignment. In case of most of the sites one of the base is showing maximum occurrence. In case of sites where two or more bases occur equal number of times, which base should be taken. An example is given below:
Seq_1: ATGCGA
Seq_2: AT-CGT
Seq_3: AT-CCG
Seq_4: AT-CCC
Seq_5: AA-CT-
As per the conventions this will be the consensus
Consensus : A T G C [G/C] N
But this output of the consensus sequence will throw an error when aligned with other sequences. So what should be done in such scenario and how to get consensus for such sites?
Depending on what you want to do downstream, you might be able to use IUPAC codes, such as
S
for[G/C]
.I can use IUPAC codes, but those are just being ignored by the application thus affecting the alignment. I am using MEGA 4.0. Also even if the application takes random base based on the letter, that would be technically a glitch.
Ah, you should really update your question to mention MEGA 4.0 and the other details of exactly what you're doing. Otherwise, you'll only ever get a rather generic reply like mine. With more details, hopefully someone familiar with MEGA can provide some insight into this.