Phylogenetic Analysis Of Protein Expression
3
1
Entering edit mode
11.4 years ago

Hi Everybody,

I am a bit of an amateur, so please excuse my ignorance.

I am interested in using NCBI (BLASTp) to determine the phylogeny of a certain protein. I want to automate the process of searching for the protein's coding genes, and form a phylogenetic tree that shows which species have the protein, and which don't. Through this process, I hope to determine the rough evolutionary point at which these proteins developed.

Are there programs that will do this for me?

If not, how might I use a library such as BioPython to perform this function?

Thanks!

biopython • 2.3k views
ADD COMMENT
1
Entering edit mode
11.4 years ago
Josh Herr 5.8k

Don't worry about being an amateur here Griffin Lester, we've all started out at the beginning before. I feel like an amateur daily around here.

First off, I want to ask you to clarify what you mean by "Protein Expression" -- what I am understanding is you want to construct a phylogeny of protein sequences to understand their homology or similar function?

Here's a run down of what I typically do when constructing a phylogeny from a previous question: Bacterial Phylogeny.

I don't know of any automated programs to go from a BLAST search to fully constructed phylogeny -- if one exists I would HIGHLY recommend NOT using it. Constructing a phylogeny is a process where you have to evaluate and judge the analysis at each step of the way (you can argue all of bioinformatics or data analysis is like this), so automation is not suggested in my opinion. If you do automate, look at your trees with extreme skepticism. My recommendation is to particularly inspect your sequence alignment matrix to determine sites of homology and variation prior to constructing your phylogeny.

You can construct a pipeline to do this and inspect each step of the way and if you are familiar with BioPython then this should not be difficult.

Whether you use protein (amino acid) sequences or nucleotide sequences will be (generally) the same process. In fact, it's my observation that most reviewers now expect an analysis of both the protein and nucleotide sequences for phylogenies of gene coding regions.

ADD COMMENT
1
Entering edit mode
11.4 years ago
cdsouthan ★ 1.9k

But for anything in Ensembl take a look at GeneTree display. The tree is pre-cooked for you

ADD COMMENT
0
Entering edit mode
11.4 years ago
Biojl ★ 1.7k

If I undesrstood the question correctly this paper may suit you. They do a similar analysis in figure 1.

Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution Rafik Neme and Diethard Tautz

http://www.biomedcentral.com/1471-2164/14/117

ADD COMMENT

Login before adding your answer.

Traffic: 3097 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6