Hi,
We are currently doing an screening of lactobacilli in different kinds of samples. I have isolated around 40 strains, sequenced a fragment of their 16S gene and BLASTed it to get the species.
I'd like to build up some kind of phylogenetic tree. In order to do so, I have downloaded in fasta format all lactobacilli 16S sequences available in European databases and I'd like to build a tree using this sequences but also including the ones I sequenced to place them regarding downloaded ones. I though about doing so using closing neighbours algorithm.. but I'm not sure
I'm pretty new doing phylogenetic trees so I'd like if somebody could give me any tips and also a good tool recommendation for unix systems or windows. I have also did a bit of research in forums and found this topic ( What is the fastest way and software to build phylogenetic trees from WGS NGS data) Would the method described in answers there be valid?
Thank you :)
In addition, I would suggest to run a test for the optimal substitution model, e.g. ModelTest. Following a recent article by Tan et al. (2015) automatic filtering of alignments (gblocks, trimal) does not improve phylogenetic trees.
Interesting paper indeed. what dou mean by substitution model?
What I meant is explained here: http://www.molecularevolution.org/resources/models/nucleotide
thank you, this indeed helped me :)
Although I haven't gone through the paper my guess is that it depends. In my experience cleaning alignments has solved problems in phylogenetic reconstruction, and this makes perfect sense from a theoretical perspective: any extraneous sequence or badly align region can confound phylogenetic reconstruction. However, if done automatically there will probably be many cases in which part of the phylogenetic signal will be lost during cleaning, which would explain the worse performance of automatic cleaning. I should read the paper, though :-)
hi, I have around 3000 sequences which I downloaded, the ones I sequenced are 40 in total. I tried a few web services but i cannot upload these large sequence set. I have already aligned all sequences togheter so far using MUSCLE. So a neighbour joining would be the first option ? then a likelyhood approach using the programs you mentioned? Also, I fear that such amount of sequence make an unclear tree with too much information, I dont know how to cope with this.. I just wanna place my sequence to some how of "reference tree made by the sequences I downloaded
RAxML can handle thousands of sequences but later the interpretation of the tree would be less clear.
You can try some software to remove sequence redundancy. With cd-hit you can remove sequences that are X% identical. That would for sure reduce the size of your alignment.
To remove redundancy I prefer to use Jalview (a multiple alignment viewer and editor). You select all but your sequences and then click "remove redundancy", where you can select different thresholds. Also make sure to remove strange, largely incomplete sequences and poorly aligned regions.
Oh ya I know Jalview!! I forgot it could remove redundancy. Anyhow, I have already tried to upload that sequences but sadly my PC cant handle so I think I need a better PC to perform this task ^^". Thank you for the advice I was pretty lost with phylogenetic trees!