Hey peeps !
im very new to phylogenetics and currently trying to reconstruct a phylogenetic tree and getting started with all the different software and tools. first i would like to reconstruct the phylogenetic tree from 138 cyanobacterial strains / 8 loci per strain (done by the original authors of the study) and second i would like to implement another 4 strains in the tree where i got all loci except the 16s-its.
heres a quote from the authors methods :
"The sequences of the 16S rDNA, 16S rDNA-ITS, PC-IGS, PSA-IGS, RNaseP, rbcLX-IGS, and rpoC were concatenated resulting in 2,697 bp. Ambiguous sites (n = 93) were removed from the sequence alignment when approximating a continuous gamma distribution (ncatG = 5): alpha (gamma, K = 5) = 0.01712, Average Ts/Tv = 2.5996. Phylogenetic trees were constructed using (i) maximum likelihood (ML), (ii) neighbour-joining (NJ) from the nucleotide sequences distance matrix (calculated using the F84 substitution model), and (iii) maximum parsimony (MP) from nucleotide sequences using the PHYLIP package .Statistical significance of the branches was estimated by bootstrap analysis generating 1000 replicates of the original data set using the PHYLIP package. Finally, consensus trees following the 50% majority rule were computed."
first i would like to understand how (which tool i need etc) to remove the ambigous data like mentioned above. how do i calculate ncatg , alpha , Ts/tv from a dataset myself to understand the authors choice.
second i would like to know if i should change any of the above parameters or the model used when adding the 4 additional strains which are lacking the 16S-IGS locus.will the additional 4 strains probably have little effect on this and will the missing 16S-locus be problematic ? the strains all belong to a well established monophyletic genus
cheers
anybody an idea ?
You're question is not specific enough. You are basically asking to be thought phylogenomics from A to Z. You should consider taking a bioinformatics course that can go through the basics of that first, or perhaps tutorials.
thanks for your reply. i know the thematics are complicated but maybe i just need a little hint to point me in the right direction. actually i decided to reconstruct just the ML tree like the authors did , without implementing the additional 4 strains with lacking loci so basically i just need to repeat what the authors did.
i think this should be accomplishable as i got all the parameters specified by the authors. something which confuses me is removing the ambiguous characters by the method specified. im just used to remove ambiguous characters (gaps) by deleting them but never heard of removing characters when approximating a continuous gamma distribution with (ncatg , alpha , Ts/tv etc. ). actually i just need a hint which software/executable to run where i can specify these parameters to remove the ambiguos sites and then construct (just) the ML tree (calculated with the F84 model i guess ? in the authors method this model is just mentioned in parenthesis next to the NJ tree , but as far i know one always has to specify a substitution model , so i think it is meant to use this model for the ML tree too ? )
i really just would like to primarily reconstruct the ML tree and then try to understand all the steps by myself (hopefully)
cheers