Entering edit mode
4.4 years ago
wangyu.ashley
•
0
I have a multiple sequence alignment file like that,
>seq A
AAACTCAGCTACG
>seq B
AAACACTGCTATG
>seq C
AAAGACTGCTATC
And I want generate two sequence from the input file,
> consensus
AAACACTGCTATG
>Alt
AAAGTCAGCTACC
Are there any software can be used to achieve this task? Any code would be much appreciated! Thank you.
biopython
'sAlignIO
has consensus sequence functionality if you are providing alignments (or sequences which are already the same length).The alt is a bit more difficult, I don't know of any software personally that could produce exactly what you need, so some custom code is probably the way to go.
How are you proposing the alt's be generated? Do you want an alt sequence for every possible combination of the variant positions? This will get unwieldy very quickly...
Thanks Joe. I want to directly detect how the SNP change in this group of gene family. The consensus&alt sequence can represent the summary of SNPs and then use this two sequence to calculate the Ka/Ks.
Do you always only have 3 input sequences? What if there is more than 2 variants for a given position - how do you intend to summarise that position?
Not only three input sequences, but most of the groups only have 2 variants.
OK, but what do you want to do with the subset which have more than 2? This will radically change the code the task needs.
I will go for keep the one which is occur more frequently in this position.