I am using biopython (version 1.72) for reconstructing a parsimony tree using multiple aligned amino acid sequence files. As an example data I have used a phylip format file of multiple aligned protein sequences to generate a result parsimony tree. The source code works successfully and results in an evolutionary tree as follows:
from Bio import Phylo
from Bio.Phylo.TreeConstruction import *
from Bio import AlignIO
aln = AlignIO.read(open('example.phy'), 'phylip')
calculator = DistanceCalculator()
dm = calculator.get_distance(aln)
constructor = DistanceTreeConstructor()
njtree = constructor.nj(dm)
starting_tree = njtree
scorer = ParsimonyScorer()
searcher = NNITreeSearcher(scorer)
constructor = ParsimonyTreeConstructor(searcher, starting_tree)
pars_tree = constructor.build_tree(aln)
Phylo.draw_ascii(pars_tree)
Below is the example sequence file in phylip format:
14 387
Zebrafish ESLLRFGLRS DLDFRLSLNG KEDLLDTGQS LSSCGVVSGD LISVILPASS
Fugu ETVLSVGLSA ETEISLSLNG SEPLEDTGQT LASCGIVSGD LIRVALIRAA
Chicken RALLAWGYSS DTEFSITLNG KDALTEDEKT LASYGIVPGD LICLLLEETD
Zebra SMTENRTAGS DTAFSVTLNR KDALTEDQKT LASYGIVSGD LICLLLEEPD
Anole QALLSWGYSS ETKFEITLNN KDSLVGDQDT LASFGIVSGD LICLILEDDA
Human QSLLTWGYSS NTRFTITLNY KDPLTGDEET LASYGIVSGD LICLILQDDI
Chimpanzee QSLLTWGYSS NTRFTITLNY KDPLTGDEET LASYGIVSGD LICLILQDDI
Dog QALLSWGFGS DTRFAITLNN KDALTGDEET LASYGIVSGD LICLILEDAI
Cow QALPTWGYSS DARFAITLNN KDALTGDEET LASYGIVSGD LLCLILEDAI
Elephant QALLTWGYSS DTRFAITLNN KDALTEDEET LASYGIVSGD LICLILEDAI
Mouse QVLLTLGFSS DTRFAITLNN KDALTGDEET LASYGIVSGD LICLVLEDDM
Platypus MDMSWTTLSP DTRFSITLNK KDALTEDNET LASYGIVCGD LICVVLEDAS
Xenopus SVTLALGYST EANFTITLNG KDALTGDQNT LESAGIISGD LIVVVLPDSQ
Amphioxus PVLGQYGLGD DMPFEISLNG RDALLGDDKP LSDLGIVSGD LIHILLASVD
QTSSAAHQTH TDQQSSQECV DLQQDCMDQQ QQQEQECVCA AAPPLLCCEA
DAPDRDDGGG HSEQVSQEAK LPDASGASTD SDQAPGPAAS CWEPMLCSET
LPPPSSSPPS LQNGKNGSSL EFPSGLVPED VDLEEGTGSY PSEPMLCSEA
LPPPPATPAP LQNGNNGSSL EFPSGLVPED ADLEEGTGSY PSEPMLCSEA
SSPSSSLPSS QSNHHSGPSQ EFTSEGGPDD LDLQEATGSF PSEPMLCCEA
IPSSTSEHSS LQNNSNGPSQ NFEAESIQDN AHMAEGTGFY PSEPMLCSES
IPSSTSEHSS LQNNSNGPSQ NFEAESIQDN AHMAEGTGFY PSEPMLCSES
LPSSTSEHSS LQNNSNGPSQ NFEAGSVQDV VDMEEGTGVF PLEPMLCSES
LPSSTSEHSS VQNNSSGPGQ HFEAEAVPDV VDVEEGTGYY LAEPMLCSES
LPSSTSEHSS LQNNSNGPSQ DVEVESVQDT VDVEEGTGFY PSEPMLCSEA
LPSSTTEHSS LQDNPSGPSQ NVEAESIQDA MSMEEVSGFH PLEPMLCNET
LPSPTSALSG LQDRSEGTSL EFDPGLTQHD MDLEETAGSY FFEPMLCSEA
QPAPAAPDKR DLGCCSMAQA AQQAPPSAED VAMEDGMSGP AWEVMLCSEA
NHNTQQGQHP SSPERATEHA ASCGNDTKLA TSKSDTTKAD PDKALSKNPV
EDGLLPLALE RLLDSSTCRS PSDCLMLALH LLLLETGFIP QGGAVSSGEM
DEGQAPWSLE LLYHSAQVSG PGDALVVAAN LLMIETGFSP QDSQLKPAEM
ADGEIPHSLE VLYLSAECTS ATDALIVLVH LLMMETGYVP QGTEAKAVSM
ADGETPHSLE MLYLSAECTS ATDALIVLVH LLMMETGYVP QGIEAKAVFM
TDGQVPHSLQ TLYHSAECTN ANDALIVSIH LIMMETGYVP QGTEAKASSM
VEGQVPHSLE TLYQSADCSD ANDALIVLIH LLMLESGYIP QGTEAKALSM
VEGQVPHSLE TLYQSADCSD ANDALIVLIH LLMLESGYIP QGTEAKALSM
VEGQVPHSLE ILYQSADCSN PNDALIISIH LLMLESGYIP QGTEAKALSM
VEGQVPHSLE ILYQSADCLN PCDALIVSIH LLMLESGYIP QGTEAKAVSM
VEGQVPHSLE TLYHAAGCSD ASDALIVLIH LLMLESGYIP QGTEAKAVSM
EDGQVPHSLE TLYQSAGCSN ISDALIVLVH LLMLESGYIP QGTETKAVTM
VDGQVPHSLE TLYHSAECTG TNDALIVLVH LLMLESGYLP LGTEAKAGSM
LDGKIPHSLE VLYQTAGCTG ASDAFIVAIH LLMLETGYLQ KGAESKVLCM
NKDEKETVVV DTEESAVNDG DGDKEVLMDV EEEEAVSLSM EGAGVGGIPS
PIGWQAAGVF RLQYVHPLLE NSLVSVVAVP MGQTLVINAV LKMETSLENS
PAGWRCGGVY KLQYSHRLCG DSVVVMVAVS MGSALIINGL LEVNQSADSV
PEKWRGNGVY KLQYTHPLCE EGSAGLTCVP LGDLVAINAT LKINREIKGV
PEKWRGNGVY KLQYTHPLCG EGCAGLTCVP LGDLIAINAT LKINEEIRSV
PENWRNKGVY KLLYTHPLCE NGFAVLTCVP LGNLIVVNAM LKITSDIKSV
PEKWKLSGVY KLQYMHPLCE GSSATLTCVP LGNLIVVNAT LKINNEIRSV
PEKWKSSGVY KLQYMHPLCE GSSATLTCVP LGNLIVVNAT LKINNEIRSV
PENWKSGGVY KLQYTHPLCE GGSAALTCVP LGTLIVINAT LKINTELRSV
PQNWRLGGVY KLQYTHPLCE GGSAALTCVP LGNLIVINAT LKINSEVRSV
PEKWKSGGVY KLQYTHPLCN GGSAALTCVP LGKLIVVNAT LKINSEIRSV
PEKWKSSGVY KLQYTHPLCE GGFAVLTCVP LGNLIIINAT IKVNGGIKNV
PEKWRSGGVY KLQYTHPLCE DGCAALICVP LGNLIVINAT LKINNEIRSV
PQDWRSGGAY RLHYTHPLCA EVSATLACLP MGKLVIINAT LKINSEMKSV
EPMLCREAVD SVPLVLQQVY RANDVKSRHD ALCMVLHVME SGFSAKEPSS
RKLLLKPDEY VTAWTGGSSG VVYRDLRRLS RLVRDQLVYP LMATARQALG
CKLCVDPSSY VTEWPGDSAA AAFKELNKLS RVFKDQVAYP LITAARHAMA
KRIQLLPASF VCFQEPEKVA GVYKDLQKLS RLFKDQLVYS LLAAARQALN
KRIQLLPSSF VCFQDPEKVA GVYKDLQKLS RLFKDQLVYS LLAAARQALN
KRLQLLPTSF ICFQDSANVV GVYKDLQKLS RLFKDRLVYP LLAAARQALN
KRLQLLPESF ICKKLGENVA NIYKDLQKLS RLFKDQLVYP LLAFTRQALN
KRLQLLPESF ICKKLGENVA NIYKDLQKLS RLFKDQLVYP LLAFTRQALN
KRLQLLPESF ICREPGENVA KIYKDLQKLS RLFKDQLVYP LLAFTRQALN
KRLQLLPESF ICKESGENVA MIYKDLQKLS RLFKDQLVYP LLAFTRQALN
KRLQLLPESF ICKEPEEDVA KIYKDLQKLS RLFKDQLVYP LLAFTRQALN
KSVQLQPGSY VAAEPGESAA KVYKDLKKLS RLFKDQLVYP LLAFTRQVLN
KRLQLLPESF ICNEQEENVA RVYKDLQKLS RLFKDQLVYP LLAFARQALN
RKLQLSTNSY ISYETDNNIA SVYKDLQKLS GQFKDQVAYP LLAAARQVLN
PEDQESINGF TLPWKSPGMY KMTYRHMACE GSSCGLTMVP MGSLLMVHGV
LPLLFGLPVL PPELLLRLLR LLDVRSLVSL SAVCRHLNTA THDASLWRHL
LPVAFGLTAL PPELLLRVFR LLDVRSVVML SAVCRHFGAI TRDTALWRHL
LPDVFGLVVL PLELKLRIFR LLDVRSLISL SAVCRDLYAA SNDQLLWRFM
LPDVFGLLVL PLELKLRIFR LLDVRSLISL SAVCRDLYTA SNDQLLWRFM
LPDVFGLVVL PLELKLRIFR LLDFRSLLSL SAVCHDLYAA SNDQLLWRFI
LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLFTA SNDPLLWRFL
LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLFTA SNDPLLWRFL
LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLLIA SNDQLLWRCL
LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLCIT SNDQLLWRCL
LPDVFGLLVL PLELKLRILR LLDVRSVLSL SAVCHDLFIA SNDPLLWRCL
LPDVFGLVVL PLELKLRIFR LLDVHSVLAL SAVCHDLLIA SNDPLLWRCL
LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLLTA SNDQLLWRFM
LPDVFGLLVL PPELKLRIFR LLDIRSLLSL SATCKEFLAD TNDPSLWRFL
VTGSNPSMHH QVQLKTDSFT FHNVQDPDRV YKNLPQLSKV FKDSVAQPLL
LHRDFRVSFP AHRDTDWREL YKQKYRQRAA RRGRHWFYPP PISPLIPFPS
YCRDFRDSHA GSRDTDWKEV YRRSYKSRSA VRRSHECFLP PLYPNPRGVF
YLRDFRDPIA RPRDTDWKEL YKKKLKQKEA LRWRHMFLPP PFHPNPFYPS
YLRDFRDPIA RPRDTDWKEL YKKKLKQKEA LRWRHMMLLP PFHPNPFYPN
YLRDFRDPVA RSRDTDWKEL YKKKMKQKDA LRWRHMMFLP PLHPNPLYPN
YLRDFRDNTV RVQDTDWKEL YRKRHIQRES PKGRFVMLLP PFYPNPLHPR
YLRDFRDNTV RVQDTDWKEL YRKRHIQRES PKGRFVMLRP PFYPNPLHPR
YLRDFRDGTV RARDTDWKEL YRKRYKQREA QRARHMMFPP PFGPSPWHPR
YLRDFRDGSI RGRDTDWKEL YKKRYKQREA QRGRHVMFLP PFYPSPLHPR
YLRDFRDSTA RARDTDWKEL YKKRYKQRDA QRARHLVFLP PTHPIPFYPM
YLRDFRDGTV RGPDTDWKEL YRKKHIQREA QRMRHAMFLP PFCPIPVYPR
YLRDFRDSTS RSRDTDWKEL YRRKQKQRDA LRWRHTMFLP PFHPNPFYPS
CVRDFRNNLP RSLDMDWKKL YREKYKQKER SRFVRRHFLP ITHPYPYYPN
ADMRQVLGLP ALALSAEIQL YKKHYQDRAR EWMRHQRSFH PPWAIPTPPH
SPALYPPGII GDYDQMPILP RPRFHPIGPL PGMSAPV
TPPPPVPGII GEYDQRPILP RPRYDPMSPF PDLDRQP
PFPIYPPMVI GEYGERPSLI PPHFDPIGSL PGANPTL
PFPIYPPMII GEYDERPSLI PPHFDPIGSL PGANPML
PFPLYPPMII GEMDERPSLF PSHLDPFGSF QNPNPTL
PFPRLPPGII GEYDQRPSLI PPRFDPVGPL PGPNPIL
PFPRLPPGII GEYDQRPSLI PPRFDPVGPL PGPNPIL
PFPLLPPGII GEYDERLSLI PPRFDPVGPL PGPRPTL
PFPLHPPGII GEYDQRLSLI PPRFDPVGPL PGPNPIL
PFPLYPPGII GEYDVRPSLI PPRFDPIGPF PGPNPIL
AYLLLPPGII GEYDERPSLI PPRFDPVDPL PGPHSLL
PFPLYPPGVI GEYDERPSLI PPRFDPIGPL PGPSPGL
IFPNYPPGII GEYDQRPSFI PNPFKVTVPF SESDPSI
SFPTYPPGFI GDYDRYPMGR GPRFDPIGPL PEHQVIP
It generates a tree as follows:
______ Chicken
___________|
___| |________ Zebra
| |
_| |____________________ Anole
| |
| |_________________ Platypus
__________|
| | _____________________ Mouse
| | |
| |____| ___________ Elephant
| | |
| |_| , Human
_______| | _____________|
| | || | Chimpanzee
| | ||
| | | ______ Dog
| | |__|
________________| | |_________ Cow
| | |
| | |_____________________________________ Xenopus
| |
_| | ________________________________________ Fugu
| |_|
| |_________________________________________ Zebrafish
|
|________________________________________________________________ Amphioxus
My question is that I need the taxas to be generated in the hierarchical order from top to bottom instead of from inner to outer direction. For example , for the above tree, I would need the following arrangement of tree to be resulted.
_______ Human
_______|
| |_______ Chimpanzee
_______|
| | _______ Dog
_______| |_______|
| | |_______ Cow
_______| |
| | |_______________________ Elephant
_______| |
| | |_______________________________ Mouse
| |
_______| |_______________________________________ Platypus
| |
| | _______________________________________ Anole_lizard
| |_______|
______| | _______________________________ Chicken
| | |_______|
| | |_______________________________ Zebra_finch
| |
| |_______________________________________________________ Xenopus
_|
| _______________________________________________________ Zebrafish
|______|
| |_______________________________________________________ Fugu
|
|______________________________________________________________ Amphioxus
The above tree is in in correct order of hierarchy from top to bottom direction. Can anyone please suggest what should I do for obtaining the branching order of tree this way? Should I add some method/function in the biopython module ?
As I said to you in your last question, in order to do this you should look at DendroPy.
The transform you are looking for is firstly, to switch the tree to a cladogram (the branch lengths are all set to 1), then ladderise the nodes in either ascending or descending order.
Your anole/chicken node is ordered in the opposite way to the rest of the tree though.
Okay, Thanks very much for your helpful responses always. I wanted to ask further, please if you can answer me for this; my question is related to above tree topology, if there is polytomy(the node branch is not resoved) in any of the branch, what does that indicate and how can it be resolved?
To my knowledge polytomies are usually indicative of a situation where the sequences are too similar to be resolved.
Usually the only solution is more data. You might be able to bootstrap the tree though and find a particular split topology that is more supported than the other.
You might want to consider looking for “split decomposition trees” in google
Thankyou for your response. I will search it this way.