The Effect Of The "Invgamma" Setting On Computing The Site Likelihood In Mrbayes?
1
3
Entering edit mode
13.4 years ago

The "invgamma" setting in MrBayes sets a gamma-distributed rate variation across sites and a proportion of invariable sites. The gamma distribution is approximated by several discrete rate categories.

According to Felsenstein 1981 (Evolutionary Trees from DNA Sequences: A Maximum Likelihood Approach), the site likelihood is computed from the root-node conditional likelihoods by $L=\sum_{s_0} \pi_{s_0} L_{s_0}^{(0)}$. However, when the invgamma setting is used in MrBayes, there will be a root-node conditional likelihood for each discrete rate category.

What effect do the discrete rates have on computing the site likelihood from the root-node conditional likelihoods?

What effect does the "proportion of invariable sites" have on the computation?

In particular, I'm looking for some documentation that answers these questions in detail.

phylogenetics • 4.3k views
ADD COMMENT
0
Entering edit mode

This might be an implementation detail that the developers might know more about.

ADD REPLY
4
Entering edit mode
13.3 years ago
Botond Sipos ★ 1.7k

Based solely on theory I think the answer to your question is the following:

Let p_{inv} be the proportion of invariant sites, n the number of gamma categories and r_{i} the rate corresponding to the i^th category of the discrete gamma model (as determined by the actual gamma shape parameter). Let alt text be the root-node conditional likelihood calculated with a given fixed rate r_{i} (let's just drop the notation for the site and the model parameters).

The +I+G model assumes that the sites rates are distributed according to mixture of two discrete distributions:

  1. the fixed rate zero with a weight p_{inv}
  2. the discrete gamma distribution with a weight 1 - p_{inv}

The full likelihood is calculated as:

L(p_{inv}, r_{1}, r_{2},\ldots,r_{n}) =  p_{inv} L(0) + (1 - p_{inv}) [\frac{1}{n} \sum_{i=1}^{n} \L(r_i)]

The likelihood computation must be performed multiple times on the same site with different fixed rates, so using a +I+G model slows down the computation n+1-fold compared to the fixed rate model.

The p_{inv} parameter is the weight of the likelihood calculated with the fixed rate zero.

ADD COMMENT

Login before adding your answer.

Traffic: 1501 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6