How to get the consensus sequence and the possible sequences from multiple sequence alignment.
1
0
Entering edit mode
4.4 years ago

I have a multiple sequence alignment file like that,

   >seq A
   AAACTCAGCTACG
   >seq B
   AAACACTGCTATG
   >seq C
   AAAGACTGCTATC

And I want generate two sequence from the input file,

   > consensus
   AAACACTGCTATG
   >Alt
   AAAGTCAGCTACC

Are there any software can be used to achieve this task? Any code would be much appreciated! Thank you.

alignment SNP • 2.1k views
ADD COMMENT
0
Entering edit mode

biopython's AlignIO has consensus sequence functionality if you are providing alignments (or sequences which are already the same length).

The alt is a bit more difficult, I don't know of any software personally that could produce exactly what you need, so some custom code is probably the way to go.

How are you proposing the alt's be generated? Do you want an alt sequence for every possible combination of the variant positions? This will get unwieldy very quickly...

ADD REPLY
0
Entering edit mode

Thanks Joe. I want to directly detect how the SNP change in this group of gene family. The consensus&alt sequence can represent the summary of SNPs and then use this two sequence to calculate the Ka/Ks.

ADD REPLY
0
Entering edit mode

Do you always only have 3 input sequences? What if there is more than 2 variants for a given position - how do you intend to summarise that position?

ADD REPLY
0
Entering edit mode

Not only three input sequences, but most of the groups only have 2 variants.

ADD REPLY
0
Entering edit mode

OK, but what do you want to do with the subset which have more than 2? This will radically change the code the task needs.

ADD REPLY
0
Entering edit mode

I will go for keep the one which is occur more frequently in this position.

ADD REPLY
0
Entering edit mode
4.4 years ago
Mensur Dlakic ★ 28k

You will need to have a local installation of HMMer. If your alignment is in aln:

hmmbuild --dna aln.hmm aln

Now you can create a consensus sequence:

hmmemit -c aln.hmm

This will print:

>aln-consensus
AAACACTGCTATG

You can sample a random sequence from this model, but in general this will not give you exactly 13 bases like in your alignment, even if you set the expected length from profile to 13:

hmmemit -p -L 13 aln.hmm

It will produce a different output each time.

ADD COMMENT

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6