Question

How Can I Compare Rates Of Evolution For Two Sets Of Genes?

3

Entering edit mode

12.4 years ago

terdon ▴ 430

I have a list of candidate genes as the result of my analysis. I am now trying to find various characteristics that they have in common. One of the things I would like to check is if my candidate genes are evolving faster or slower than the rest of the genes in my dataset.

Now, I know how to do this manually by building multi species alignments for each of my gene products and calculating ka/ks ratios for each set of alignments. This, however, is not a trivial process and I really do not want to do this manually for my ~1500 genes.

Can anyone suggest a tool that will take two lists of genes (or proteins) and return an estimate of evolutionary rate for each gene?

evolution • 11k views

ADD COMMENT • link updated 12.4 years ago by Rahul Sharma ▴ 660 • written 12.4 years ago by terdon ▴ 430

score 6 · Answer 1 · 2012-11-21

6

Entering edit mode

12.4 years ago

Josh Herr 5.8k

In the past I've used Tajima's D as a measure of sequence evolution. There are numerous methods, but (IMHO) this measure seems to be the best accepted as a way to measure if two (or more) sets of genes are evolving over time.

It's not difficult to compute it on your own, but there are a few scripts out there to do it for you. Check out the MANVa software package, or two helpful scripts: DENSERM_P in Perl and analyzer HKA version 6 in C.

This paper helped me measure evolutionary rates using SNP data.

ADD COMMENT • link 12.4 years ago by Josh Herr 5.8k

2

Entering edit mode

Note: Tajima's D is a population genetic statistic - it measures whether a locus is under some non-random force (including population expansion/contraction) by comparing allele frequencies in a (hopefully random) population sample of sequences. From the O.P. it sounds like terdon wants to compare evolutionary rates among species, which is not what D will do.

ADD REPLY • link 12.4 years ago by David W 4.9k

0

Entering edit mode

+1 on both answers and thanks for the correction. I have in fact used Tajima's D for sequence evolution at the level of populations to look for selection against alleles in putative clonal plants, so that would make sense. I appreciate the clarification.

You have a great blog too. Going to check out your MMOD R library now...

ADD REPLY • link 12.4 years ago by Josh Herr 5.8k

0

Entering edit mode

Yup, I don't have population data. What I have is a list of candidate proteins (easily mapped back to genes) and a list of non candidate proteins. Based on my analysis I expect my candidates to be evolving faster than my non candidates. I was hoping to use something like eggNOG or EGO to map each of my cands to an orthologous group and obtain a measure of the rate of evolutionary change for each of those groups.

ADD REPLY • link 12.4 years ago by terdon ▴ 430

score 6 · Answer 2 · 2012-11-21

Hi Terdon,

I would chose my favorite scripting language and put together a pipeline. Presuming you have your orthologous sets of sequences sorted already the process is quite straight forward

Write each sequence-set to a separate file
Align sequences with [muscle/tcoffe/PRANK]
Calculate Ka/Ks for each alignment
Parse the result and pump it into a csv file so you can compare rates in candidate v "background"

I would use codeML to estimate one KaKs value for each locus, but be aware to do that you'd need (a) a tree relating your species to each other and (b) to write control files for each analysis. Biopython has a module that deals with codeML, that could potentially make that step easier.

(You should also note, KaKs is not really a rate of evolution, but the degree to which variants in a sequence are evolving under positive or negative selection.)

edit: If you do put together a pipleline like this be sure to check at least a sample of the alignments by eye. Misalignment are a major source of error automated screens of this sort

score 0 · Answer 3 · 2012-11-21

0

Entering edit mode

12.4 years ago

AGS ▴ 250

Why not fit them to a global vs. local clock? Easy to do in PAML.

ADD COMMENT • link 12.4 years ago by AGS ▴ 250

0

Entering edit mode

Could you expand on that please?

ADD REPLY • link 12.4 years ago by terdon ▴ 430

0

Entering edit mode

Run PAML twice. One time with your genes set to a local clock, the other with all the data set to a global clock. You could then pull the likelihoods for each test from each gene out and do a chi^2 test and a multiple test correction. To find out if your gene is evolving faster or slower, you need to find the scale factor that is in the PAML output files.

ADD REPLY • link 12.4 years ago by AGS ▴ 250