Question

reconstructing a phylogenetic multilocus tree

0

Entering edit mode

8.7 years ago

holzdan ▴ 20

Hey peeps !

im very new to phylogenetics and currently trying to reconstruct a phylogenetic tree and getting started with all the different software and tools. first i would like to reconstruct the phylogenetic tree from 138 cyanobacterial strains / 8 loci per strain (done by the original authors of the study) and second i would like to implement another 4 strains in the tree where i got all loci except the 16s-its.

heres a quote from the authors methods :

"The sequences of the 16S rDNA, 16S rDNA-ITS, PC-IGS, PSA-IGS, RNaseP, rbcLX-IGS, and rpoC were concatenated resulting in 2,697 bp. Ambiguous sites (n = 93) were removed from the sequence alignment when approximating a continuous gamma distribution (ncatG = 5): alpha (gamma, K = 5) = 0.01712, Average Ts/Tv = 2.5996. Phylogenetic trees were constructed using (i) maximum likelihood (ML), (ii) neighbour-joining (NJ) from the nucleotide sequences distance matrix (calculated using the F84 substitution model), and (iii) maximum parsimony (MP) from nucleotide sequences using the PHYLIP package .Statistical significance of the branches was estimated by bootstrap analysis generating 1000 replicates of the original data set using the PHYLIP package. Finally, consensus trees following the 50% majority rule were computed."

first i would like to understand how (which tool i need etc) to remove the ambigous data like mentioned above. how do i calculate ncatg , alpha , Ts/tv from a dataset myself to understand the authors choice.

second i would like to know if i should change any of the above parameters or the model used when adding the 4 additional strains which are lacking the 16S-IGS locus.will the additional 4 strains probably have little effect on this and will the missing 16S-locus be problematic ? the strains all belong to a well established monophyletic genus

cheers

alignment • 2.6k views

ADD COMMENT • link updated 8.7 years ago by Brice Sarver ★ 3.8k • written 8.7 years ago by holzdan ▴ 20

0

Entering edit mode

anybody an idea ?

ADD REPLY • link 8.7 years ago by holzdan ▴ 20

2

Entering edit mode

You're question is not specific enough. You are basically asking to be thought phylogenomics from A to Z. You should consider taking a bioinformatics course that can go through the basics of that first, or perhaps tutorials.

ADD REPLY • link 8.7 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

thanks for your reply. i know the thematics are complicated but maybe i just need a little hint to point me in the right direction. actually i decided to reconstruct just the ML tree like the authors did , without implementing the additional 4 strains with lacking loci so basically i just need to repeat what the authors did.

i think this should be accomplishable as i got all the parameters specified by the authors. something which confuses me is removing the ambiguous characters by the method specified. im just used to remove ambiguous characters (gaps) by deleting them but never heard of removing characters when approximating a continuous gamma distribution with (ncatg , alpha , Ts/tv etc. ). actually i just need a hint which software/executable to run where i can specify these parameters to remove the ambiguos sites and then construct (just) the ML tree (calculated with the F84 model i guess ? in the authors method this model is just mentioned in parenthesis next to the NJ tree , but as far i know one always has to specify a substitution model , so i think it is meant to use this model for the ML tree too ? )

i really just would like to primarily reconstruct the ML tree and then try to understand all the steps by myself (hopefully)

cheers

ADD REPLY • link 8.7 years ago by holzdan ▴ 20

score 1 · Answer 1 · 2016-03-08

Phylogenetics is one of the most difficult subfields of biology. Unfortunately, you can either blindly stumble through an analysis and get a tree that will certainly not get past a knowledgable reviewer, or you can spend some time learning how to do it correctly. I have made posts on Biostars outlining the general approach, but it requires you to do a lot of different steps that will seem confusing if you don't have a solid foundation in computational genetics.

Start with Joe Felsenstein's classic book Inferring Phylogenies. The tools people use will make sense after reading it. If you are in more of a rush, try reading manuals and papers associated with major software packages, such as BEAST, MrBayes, and PAUP*.