Phylogenetic Analysis
8
5
Entering edit mode
14.1 years ago
User 0063 ▴ 240

Hi all,

I'm new to bioinformatics. I need to make a phylogenetic analysis of a protein sequence. I'd like to use Maximum likelihood method. Could you give any advice to build a good MSA and tree? I mean software, tips and so on....

Thank you very much

phylogenetics multiple protein • 11k views
ADD COMMENT
0
Entering edit mode

Citing wikipedia: "Phylum" is adopted from the Greek φυλαί phylai, the clan-based voting groups in Greek city-states.

ADD REPLY
10
Entering edit mode
14.1 years ago
Dave Lunt ★ 2.0k

If you really are new to bioinformatics I would sugest it will be easiest to use one of the excellent online phylogeny pipelines rather than choosing and installing programmes locally.

www.phylogeny.fr is excellent, and easy to use. It can do alignment using MUSCLE and fast maximum likelihood using PhyML on up to 200 protein sequences. In fact these are the defaults. Both these algorithms are among the best and this will be a fast and high quality tree.

There are other online options too (e.g. CIPRES), but Phylogeny-France is easy to use and high quality. It will even display the tree nicely at the end!

ADD COMMENT
1
Entering edit mode

CIPRES is great if you have a lot of big sequences (RAxML works good for that and they run the HPC version).

ADD REPLY
7
Entering edit mode
14.1 years ago

Using online systems isn't always the fastest way to achieve results, although it is generally much easier, as you are using a web interface to control the program inputs. If you are able to use your command terminal, then you can download binaries that will give you your results much faster (depending on the machine specs of course).

I'd use MAFFT as my first choice, or MUSCLE comes a close second (although FastTree recommends it), for building the alignments. However, I'd recommend FastTree over PhyML for building the trees.

FastTree approximates to maximum-likelihood, performing heuristic neighbour-joining using a minimal model of evolution, before maximizing the trees likelihood as detailed here. It also takes input in FASTA format, which means you don't need to convert to PHYLIP format, as with PhyML or PHYLIP.

Cheers,

Steve

ADD COMMENT
0
Entering edit mode

Thank you veruy much for your precious suggestions. I'd like to put also another question. I've performed a psi blast searching against nr protein db Which sequences should I use to build MSA?

Best regards

ADD REPLY
0
Entering edit mode

You would want to use the filtered PSI-BLAST output alignment files. Usually in the format queryname-originalfastafilename_psiali.fasta?

ADD REPLY
0
Entering edit mode

I think you would need to use the filtered PSI-BLAST output alignment files? Usually in the format queryname-originalfastaname_psiali.fasta. I'm note sure if you would need to remove the gaps first? Perhaps someone can clarify that though?

ADD REPLY
6
Entering edit mode
14.1 years ago
Paulo Nuin ★ 3.7k

You best option for alignment is MAFFT and I would recommend Phylip to calculate your tree, even though there might be some other faster options out there. In this case you would need a file converter, to convert from FASTA to the Phylip format.

ADD COMMENT
4
Entering edit mode
14.1 years ago
Thaman ★ 3.3k

First you need to understand what is Phylogenetics definition and according to Wikipedia it's simply an evolution relationship tree. Relationship here means the sharing of common features which can be furthermore define in terms of orthologs and paralogs. So phylogenetic trees can be orthologs or paralogs.

How we can start drawing phylogenetic tree by:-

  • Multiple sequence Alignement (MSA) - There are plenty enough alignment tools available online reliable one can be : ClustalW , T-Coffee or MUSCLE

  • Phylogency approach - Just by MSA you can't draw phylogenetic so it's important to apply phylogeny approach on our generated alignment which can be done by PHYLIP. PHYLIP has different methods like parsimony, distance matrix, maximum likelihood, bootstrapping and e.t.c. In your case you can use PROTDIST in particular. Similarly, for Bootstrap =>Seqboot, Maximum likelihood=>Proml, Consensus=>consense can be use.

  • Visualization - Finally, after phylogeny approach it's possible to generate phylogenetic tree. Best visualizing tools can be : TreeView , TreeDyn

Further, you can go through my question archieve which are mostly related with Phylogenetic tree generation approach

ADD COMMENT
3
Entering edit mode
14.1 years ago
Rm 8.3k

Use phylip or Mega or Tree-Puzzle softwares.

Simple pipeline for set of protein sequences

multi.fasta--> Alignment(clustal or muscle or t-coffe) --> Distmatrix (protdist)-->Bootstrap (Seqboot)-->maximum likelihood(Proml)-->Consensus(consense)-->Visualise the tree (Treeview)

or after alignment step use "Tree puzzle".

TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing.

ADD COMMENT
3
Entering edit mode
14.1 years ago

make all your sequence in a single file. each sequence should seperate with a single line and ">" symbol which indicate fasta format. and upload this single file in http://www.ebi.ac.uk/clustalw thn download the .aln file. Now you just download a free software called Jalview. then you can load your .aln file which contain phylogenetic tree in jalview. you can download jalview from the following link.http://www.jalview.org/download.html

Bowang

ADD COMMENT
3
Entering edit mode
14.1 years ago

Clustalw2, T-coffee, and MUSCLE are all good MSA tools. They are pretty different in implementation.

To estimate the tree use Phylip. Phylip can apply Parsimony and Maximum likelihood (ML) methods.

You can also try to use the Bayesian estimation of phylogeny which is very different from the ML method. A good tool for that would be MrBayes. It's easy to install and run - it's probably worth a try. Here is a list of other ML and Bayesian phylogeny estimators: http://mrbayes.csit.fsu.edu/links.php

ADD COMMENT

Login before adding your answer.

Traffic: 1975 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6