Hierarchy order of tree branches using Biopython
0
0
Entering edit mode
6.0 years ago
mdsiddra ▴ 30

I am using biopython (version 1.72) for reconstructing a parsimony tree using multiple aligned amino acid sequence files. As an example data I have used a phylip format file of multiple aligned protein sequences to generate a result parsimony tree. The source code works successfully and results in an evolutionary tree as follows:

from Bio import Phylo
from Bio.Phylo.TreeConstruction import *
from Bio import AlignIO

aln = AlignIO.read(open('example.phy'), 'phylip')

calculator = DistanceCalculator()
dm = calculator.get_distance(aln)

constructor = DistanceTreeConstructor()
njtree = constructor.nj(dm)
starting_tree = njtree
scorer = ParsimonyScorer()
searcher = NNITreeSearcher(scorer)
constructor = ParsimonyTreeConstructor(searcher, starting_tree)
pars_tree = constructor.build_tree(aln)

Phylo.draw_ascii(pars_tree)

Below is the example sequence file in phylip format:

 14 387
Zebrafish  ESLLRFGLRS DLDFRLSLNG KEDLLDTGQS LSSCGVVSGD LISVILPASS
Fugu       ETVLSVGLSA ETEISLSLNG SEPLEDTGQT LASCGIVSGD LIRVALIRAA
Chicken    RALLAWGYSS DTEFSITLNG KDALTEDEKT LASYGIVPGD LICLLLEETD
Zebra      SMTENRTAGS DTAFSVTLNR KDALTEDQKT LASYGIVSGD LICLLLEEPD
Anole      QALLSWGYSS ETKFEITLNN KDSLVGDQDT LASFGIVSGD LICLILEDDA
Human      QSLLTWGYSS NTRFTITLNY KDPLTGDEET LASYGIVSGD LICLILQDDI
Chimpanzee QSLLTWGYSS NTRFTITLNY KDPLTGDEET LASYGIVSGD LICLILQDDI
Dog        QALLSWGFGS DTRFAITLNN KDALTGDEET LASYGIVSGD LICLILEDAI
Cow        QALPTWGYSS DARFAITLNN KDALTGDEET LASYGIVSGD LLCLILEDAI
Elephant   QALLTWGYSS DTRFAITLNN KDALTEDEET LASYGIVSGD LICLILEDAI
Mouse      QVLLTLGFSS DTRFAITLNN KDALTGDEET LASYGIVSGD LICLVLEDDM
Platypus   MDMSWTTLSP DTRFSITLNK KDALTEDNET LASYGIVCGD LICVVLEDAS
Xenopus    SVTLALGYST EANFTITLNG KDALTGDQNT LESAGIISGD LIVVVLPDSQ
Amphioxus  PVLGQYGLGD DMPFEISLNG RDALLGDDKP LSDLGIVSGD LIHILLASVD

           QTSSAAHQTH TDQQSSQECV DLQQDCMDQQ QQQEQECVCA AAPPLLCCEA
           DAPDRDDGGG HSEQVSQEAK LPDASGASTD SDQAPGPAAS CWEPMLCSET
           LPPPSSSPPS LQNGKNGSSL EFPSGLVPED VDLEEGTGSY PSEPMLCSEA
           LPPPPATPAP LQNGNNGSSL EFPSGLVPED ADLEEGTGSY PSEPMLCSEA
           SSPSSSLPSS QSNHHSGPSQ EFTSEGGPDD LDLQEATGSF PSEPMLCCEA
           IPSSTSEHSS LQNNSNGPSQ NFEAESIQDN AHMAEGTGFY PSEPMLCSES
           IPSSTSEHSS LQNNSNGPSQ NFEAESIQDN AHMAEGTGFY PSEPMLCSES
           LPSSTSEHSS LQNNSNGPSQ NFEAGSVQDV VDMEEGTGVF PLEPMLCSES
           LPSSTSEHSS VQNNSSGPGQ HFEAEAVPDV VDVEEGTGYY LAEPMLCSES
           LPSSTSEHSS LQNNSNGPSQ DVEVESVQDT VDVEEGTGFY PSEPMLCSEA
           LPSSTTEHSS LQDNPSGPSQ NVEAESIQDA MSMEEVSGFH PLEPMLCNET
           LPSPTSALSG LQDRSEGTSL EFDPGLTQHD MDLEETAGSY FFEPMLCSEA
           QPAPAAPDKR DLGCCSMAQA AQQAPPSAED VAMEDGMSGP AWEVMLCSEA
           NHNTQQGQHP SSPERATEHA ASCGNDTKLA TSKSDTTKAD PDKALSKNPV

           EDGLLPLALE RLLDSSTCRS PSDCLMLALH LLLLETGFIP QGGAVSSGEM
           DEGQAPWSLE LLYHSAQVSG PGDALVVAAN LLMIETGFSP QDSQLKPAEM
           ADGEIPHSLE VLYLSAECTS ATDALIVLVH LLMMETGYVP QGTEAKAVSM
           ADGETPHSLE MLYLSAECTS ATDALIVLVH LLMMETGYVP QGIEAKAVFM
           TDGQVPHSLQ TLYHSAECTN ANDALIVSIH LIMMETGYVP QGTEAKASSM
           VEGQVPHSLE TLYQSADCSD ANDALIVLIH LLMLESGYIP QGTEAKALSM
           VEGQVPHSLE TLYQSADCSD ANDALIVLIH LLMLESGYIP QGTEAKALSM
           VEGQVPHSLE ILYQSADCSN PNDALIISIH LLMLESGYIP QGTEAKALSM
           VEGQVPHSLE ILYQSADCLN PCDALIVSIH LLMLESGYIP QGTEAKAVSM
           VEGQVPHSLE TLYHAAGCSD ASDALIVLIH LLMLESGYIP QGTEAKAVSM
           EDGQVPHSLE TLYQSAGCSN ISDALIVLVH LLMLESGYIP QGTETKAVTM
           VDGQVPHSLE TLYHSAECTG TNDALIVLVH LLMLESGYLP LGTEAKAGSM
           LDGKIPHSLE VLYQTAGCTG ASDAFIVAIH LLMLETGYLQ KGAESKVLCM
           NKDEKETVVV DTEESAVNDG DGDKEVLMDV EEEEAVSLSM EGAGVGGIPS

           PIGWQAAGVF RLQYVHPLLE NSLVSVVAVP MGQTLVINAV LKMETSLENS
           PAGWRCGGVY KLQYSHRLCG DSVVVMVAVS MGSALIINGL LEVNQSADSV
           PEKWRGNGVY KLQYTHPLCE EGSAGLTCVP LGDLVAINAT LKINREIKGV
           PEKWRGNGVY KLQYTHPLCG EGCAGLTCVP LGDLIAINAT LKINEEIRSV
           PENWRNKGVY KLLYTHPLCE NGFAVLTCVP LGNLIVVNAM LKITSDIKSV
           PEKWKLSGVY KLQYMHPLCE GSSATLTCVP LGNLIVVNAT LKINNEIRSV
           PEKWKSSGVY KLQYMHPLCE GSSATLTCVP LGNLIVVNAT LKINNEIRSV
           PENWKSGGVY KLQYTHPLCE GGSAALTCVP LGTLIVINAT LKINTELRSV
           PQNWRLGGVY KLQYTHPLCE GGSAALTCVP LGNLIVINAT LKINSEVRSV
           PEKWKSGGVY KLQYTHPLCN GGSAALTCVP LGKLIVVNAT LKINSEIRSV
           PEKWKSSGVY KLQYTHPLCE GGFAVLTCVP LGNLIIINAT IKVNGGIKNV
           PEKWRSGGVY KLQYTHPLCE DGCAALICVP LGNLIVINAT LKINNEIRSV
           PQDWRSGGAY RLHYTHPLCA EVSATLACLP MGKLVIINAT LKINSEMKSV
           EPMLCREAVD SVPLVLQQVY RANDVKSRHD ALCMVLHVME SGFSAKEPSS

           RKLLLKPDEY VTAWTGGSSG VVYRDLRRLS RLVRDQLVYP LMATARQALG
           CKLCVDPSSY VTEWPGDSAA AAFKELNKLS RVFKDQVAYP LITAARHAMA
           KRIQLLPASF VCFQEPEKVA GVYKDLQKLS RLFKDQLVYS LLAAARQALN
           KRIQLLPSSF VCFQDPEKVA GVYKDLQKLS RLFKDQLVYS LLAAARQALN
           KRLQLLPTSF ICFQDSANVV GVYKDLQKLS RLFKDRLVYP LLAAARQALN
           KRLQLLPESF ICKKLGENVA NIYKDLQKLS RLFKDQLVYP LLAFTRQALN
           KRLQLLPESF ICKKLGENVA NIYKDLQKLS RLFKDQLVYP LLAFTRQALN
           KRLQLLPESF ICREPGENVA KIYKDLQKLS RLFKDQLVYP LLAFTRQALN
           KRLQLLPESF ICKESGENVA MIYKDLQKLS RLFKDQLVYP LLAFTRQALN
           KRLQLLPESF ICKEPEEDVA KIYKDLQKLS RLFKDQLVYP LLAFTRQALN
           KSVQLQPGSY VAAEPGESAA KVYKDLKKLS RLFKDQLVYP LLAFTRQVLN
           KRLQLLPESF ICNEQEENVA RVYKDLQKLS RLFKDQLVYP LLAFARQALN
           RKLQLSTNSY ISYETDNNIA SVYKDLQKLS GQFKDQVAYP LLAAARQVLN
           PEDQESINGF TLPWKSPGMY KMTYRHMACE GSSCGLTMVP MGSLLMVHGV

           LPLLFGLPVL PPELLLRLLR LLDVRSLVSL SAVCRHLNTA THDASLWRHL
           LPVAFGLTAL PPELLLRVFR LLDVRSVVML SAVCRHFGAI TRDTALWRHL
           LPDVFGLVVL PLELKLRIFR LLDVRSLISL SAVCRDLYAA SNDQLLWRFM
           LPDVFGLLVL PLELKLRIFR LLDVRSLISL SAVCRDLYTA SNDQLLWRFM
           LPDVFGLVVL PLELKLRIFR LLDFRSLLSL SAVCHDLYAA SNDQLLWRFI
           LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLFTA SNDPLLWRFL
           LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLFTA SNDPLLWRFL
           LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLLIA SNDQLLWRCL
           LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLCIT SNDQLLWRCL
           LPDVFGLLVL PLELKLRILR LLDVRSVLSL SAVCHDLFIA SNDPLLWRCL
           LPDVFGLVVL PLELKLRIFR LLDVHSVLAL SAVCHDLLIA SNDPLLWRCL
           LPDVFGLVVL PLELKLRIFR LLDVRSVLSL SAVCRDLLTA SNDQLLWRFM
           LPDVFGLLVL PPELKLRIFR LLDIRSLLSL SATCKEFLAD TNDPSLWRFL
           VTGSNPSMHH QVQLKTDSFT FHNVQDPDRV YKNLPQLSKV FKDSVAQPLL

           LHRDFRVSFP AHRDTDWREL YKQKYRQRAA RRGRHWFYPP PISPLIPFPS
           YCRDFRDSHA GSRDTDWKEV YRRSYKSRSA VRRSHECFLP PLYPNPRGVF
           YLRDFRDPIA RPRDTDWKEL YKKKLKQKEA LRWRHMFLPP PFHPNPFYPS
           YLRDFRDPIA RPRDTDWKEL YKKKLKQKEA LRWRHMMLLP PFHPNPFYPN
           YLRDFRDPVA RSRDTDWKEL YKKKMKQKDA LRWRHMMFLP PLHPNPLYPN
           YLRDFRDNTV RVQDTDWKEL YRKRHIQRES PKGRFVMLLP PFYPNPLHPR
           YLRDFRDNTV RVQDTDWKEL YRKRHIQRES PKGRFVMLRP PFYPNPLHPR
           YLRDFRDGTV RARDTDWKEL YRKRYKQREA QRARHMMFPP PFGPSPWHPR
           YLRDFRDGSI RGRDTDWKEL YKKRYKQREA QRGRHVMFLP PFYPSPLHPR
           YLRDFRDSTA RARDTDWKEL YKKRYKQRDA QRARHLVFLP PTHPIPFYPM
           YLRDFRDGTV RGPDTDWKEL YRKKHIQREA QRMRHAMFLP PFCPIPVYPR
           YLRDFRDSTS RSRDTDWKEL YRRKQKQRDA LRWRHTMFLP PFHPNPFYPS
           CVRDFRNNLP RSLDMDWKKL YREKYKQKER SRFVRRHFLP ITHPYPYYPN
           ADMRQVLGLP ALALSAEIQL YKKHYQDRAR EWMRHQRSFH PPWAIPTPPH

           SPALYPPGII GDYDQMPILP RPRFHPIGPL PGMSAPV
           TPPPPVPGII GEYDQRPILP RPRYDPMSPF PDLDRQP
           PFPIYPPMVI GEYGERPSLI PPHFDPIGSL PGANPTL
           PFPIYPPMII GEYDERPSLI PPHFDPIGSL PGANPML
           PFPLYPPMII GEMDERPSLF PSHLDPFGSF QNPNPTL
           PFPRLPPGII GEYDQRPSLI PPRFDPVGPL PGPNPIL
           PFPRLPPGII GEYDQRPSLI PPRFDPVGPL PGPNPIL
           PFPLLPPGII GEYDERLSLI PPRFDPVGPL PGPRPTL
           PFPLHPPGII GEYDQRLSLI PPRFDPVGPL PGPNPIL
           PFPLYPPGII GEYDVRPSLI PPRFDPIGPF PGPNPIL
           AYLLLPPGII GEYDERPSLI PPRFDPVDPL PGPHSLL
           PFPLYPPGVI GEYDERPSLI PPRFDPIGPL PGPSPGL
           IFPNYPPGII GEYDQRPSFI PNPFKVTVPF SESDPSI
           SFPTYPPGFI GDYDRYPMGR GPRFDPIGPL PEHQVIP

It generates a tree as follows:

                                                        ______ Chicken
                                            ___________|
                                        ___|           |________ Zebra
                                       |   |
                                      _|   |____________________ Anole
                                     | |
                                     | |_________________ Platypus
                           __________|
                          |          |     _____________________ Mouse
                          |          |    |
                          |          |____|  ___________ Elephant
                          |               | |
                          |               |_|              , Human
                   _______|                 | _____________|
                  |       |                 ||             | Chimpanzee
                  |       |                 ||
                  |       |                  |   ______ Dog
                  |       |                  |__|
  ________________|       |                     |_________ Cow
 |                |       |
 |                |       |_____________________________________ Xenopus
 |                |
_|                |  ________________________________________ Fugu
 |                |_|
 |                  |_________________________________________ Zebrafish
 |
 |________________________________________________________________ Amphioxus

My question is that I need the taxas to be generated in the hierarchical order from top to bottom instead of from inner to outer direction. For example , for the above tree, I would need the following arrangement of tree to be resulted.

                                                         _______ Human
                                                 _______|
                                                |       |_______ Chimpanzee
                                         _______|
                                        |       |        _______ Dog
                                 _______|       |_______|
                                |       |               |_______ Cow
                         _______|       |
                        |       |       |_______________________ Elephant
                 _______|       |
                |       |       |_______________________________ Mouse
                |       |
         _______|       |_______________________________________ Platypus
        |       |
        |       |        _______________________________________ Anole_lizard
        |       |_______|
  ______|               |        _______________________________ Chicken
 |      |               |_______|
 |      |                       |_______________________________ Zebra_finch
 |      |
 |      |_______________________________________________________ Xenopus
_|
 |       _______________________________________________________ Zebrafish
 |______|
 |      |_______________________________________________________ Fugu
 |
 |______________________________________________________________ Amphioxus

The above tree is in in correct order of hierarchy from top to bottom direction. Can anyone please suggest what should I do for obtaining the branching order of tree this way? Should I add some method/function in the biopython module ?

biopython phylogenetics • 920 views
ADD COMMENT
0
Entering edit mode

As I said to you in your last question, in order to do this you should look at DendroPy.

The transform you are looking for is firstly, to switch the tree to a cladogram (the branch lengths are all set to 1), then ladderise the nodes in either ascending or descending order.

Your anole/chicken node is ordered in the opposite way to the rest of the tree though.

ADD REPLY
0
Entering edit mode

Okay, Thanks very much for your helpful responses always. I wanted to ask further, please if you can answer me for this; my question is related to above tree topology, if there is polytomy(the node branch is not resoved) in any of the branch, what does that indicate and how can it be resolved?

ADD REPLY
0
Entering edit mode

To my knowledge polytomies are usually indicative of a situation where the sequences are too similar to be resolved.

Usually the only solution is more data. You might be able to bootstrap the tree though and find a particular split topology that is more supported than the other.

You might want to consider looking for “split decomposition trees” in google

ADD REPLY
0
Entering edit mode

Thankyou for your response. I will search it this way.

ADD REPLY

Login before adding your answer.

Traffic: 2046 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6