I'm new to bioinformatics.
I need to make a phylogenetic analysis of a protein sequence.
I'd like to use Maximum likelihood method.
Could you give any advice to build a good MSA and tree?
I mean software, tips and so on....
If you really are new to bioinformatics I would sugest it will be easiest to use one of the excellent online phylogeny pipelines rather than choosing and installing programmes locally.
www.phylogeny.fr is excellent, and easy to use. It can do alignment using MUSCLE and fast maximum likelihood using PhyML on up to 200 protein sequences. In fact these are the defaults. Both these algorithms are among the best and this will be a fast and high quality tree.
There are other online options too (e.g. CIPRES), but Phylogeny-France is easy to use and high quality. It will even display the tree nicely at the end!
Using online systems isn't always the fastest way to achieve results, although it is generally much easier, as you are using a web interface to control the program inputs. If you are able to use your command terminal, then you can download binaries that will give you your results much faster (depending on the machine specs of course).
I'd use MAFFT as my first choice, or MUSCLE comes a close second (although FastTree recommends it), for building the alignments. However, I'd recommend FastTree over PhyML for building the trees.
FastTree approximates to maximum-likelihood, performing heuristic neighbour-joining using a minimal model of evolution, before maximizing the trees likelihood as detailed here. It also takes input in FASTA format, which means you don't need to convert to PHYLIP format, as with PhyML or PHYLIP.
Thank you veruy much for your precious suggestions.
I'd like to put also another question.
I've performed a psi blast searching against nr protein db
Which sequences should I use to build MSA?
I think you would need to use the filtered PSI-BLAST output alignment files? Usually in the format queryname-originalfastaname_psiali.fasta. I'm note sure if you would need to remove the gaps first? Perhaps someone can clarify that though?
You best option for alignment is MAFFT and I would recommend Phylip to calculate your tree, even though there might be some other faster options out there. In this case you would need a file converter, to convert from FASTA to the Phylip format.
First you need to understand what is Phylogenetics definition and according to Wikipedia it's simply an evolution relationship tree. Relationship here means the sharing of common features which can be furthermore define in terms of orthologs and paralogs. So phylogenetic trees can be orthologs or paralogs.
Phylogency approach - Just by MSA you can't draw phylogenetic so it's important to apply phylogeny approach on our generated alignment which can be done by PHYLIP. PHYLIP has different methods like parsimony, distance matrix, maximum likelihood, bootstrapping and e.t.c. In your case you can use PROTDIST in particular. Similarly, for Bootstrap =>Seqboot, Maximum likelihood=>Proml, Consensus=>consense can be use.
Visualization - Finally, after phylogeny approach it's possible to generate phylogenetic tree. Best visualizing tools can be : TreeView , TreeDyn
Further, you can go through my question archieve which are mostly related with Phylogenetic tree generation approach
multi.fasta--> Alignment(clustal or muscle or t-coffe) --> Distmatrix (protdist)-->Bootstrap (Seqboot)-->maximum likelihood(Proml)-->Consensus(consense)-->Visualise the tree (Treeview)
make all your sequence in a single file. each sequence should seperate with a single line and ">" symbol which indicate fasta format. and upload this single file in http://www.ebi.ac.uk/clustalw
thn download the .aln file. Now you just download a free software called Jalview. then you can load your .aln file which contain phylogenetic tree in jalview. you can download jalview from the following link.http://www.jalview.org/download.html
Clustalw2, T-coffee, and MUSCLE are all good MSA tools. They are pretty different in implementation.
To estimate the tree use Phylip. Phylip can apply Parsimony and Maximum likelihood (ML) methods.
You can also try to use the Bayesian estimation of phylogeny which is very different from the ML method. A good tool for that would be MrBayes. It's easy to install and run - it's probably worth a try. Here is a list of other ML and Bayesian phylogeny estimators:
http://mrbayes.csit.fsu.edu/links.php
Citing wikipedia: "Phylum" is adopted from the Greek φυλαί phylai, the clan-based voting groups in Greek city-states.