dN/dS output trouble in PAML
1
0
Entering edit mode
10.1 years ago
munna.uppal ▴ 10

Hi all,

I'm having an issue with the output of my Codeml ctl file. I'm attempting to calculate a single dN/dS (or, omega) value for an entire alignment of 11 sequences (I know there have been posts in the past asking for similar help, but I haven't been able to find a satisfactory solution).

However, Codeml isn't outputting a single summary statistic, and instead gives me a sorta lower triangular matrix of values. I'm not sure what these values exactly represent but I imagine they're dN/dS values of some sort.


  • The relevant input:

    runmode = 0, seqtype = 1, CodonFreq = 2, model = 0, NSsites = 0
    fix_kappa = 0, kappa = 1, fix_omega = 0, omega = 1, ncatG = 10
    
  • The relevant output (a snapshot):

    Updated_Gene_HN_CI_7_D1
    Updated_Gene_HN_CI_8_D1 0.0278 (0.0046 0.1646)
    Updated_Gene_HN_CI_9_D1 0.0173 (0.0008 0.0440) 0.0320 (0.0046 0.1430)
    Updated_Gene_HN_CI_10_D1 0.0734 (0.0030 0.0414) 0.0328 (0.0050 0.1510) 0.0640 (0.0034 0.0535)
    Updated_Gene_HN_CI_11_D1 0.0291 (0.0030 0.1047) 0.0263 (0.0042 0.1594) 0.0317 (0.0030 0.0961) 0.0491 (0.0034 0.0697)
    

From a helpful resource with sample codeml.ctl code (http://www.researchgate.net/post/How_to_interpret_codeml_output_from_PAML_Phylogenetic_Analysis_by_Maximum_Likelihood), I've been told that I should be getting something like the following:

Detailed output identifying parameters
kappa (ts/tv) = 1.00000
omega (dN/dS) = 1.00000

But this is nowhere to be found in my output file. So my question is, what's going on? What are the values that I'm seeing currently? And why is my .ctl output not giving summary dN/dS or kappa values? Is there perhaps something wrong with my input or am I just missing something?

Any and all help is appreciated.

sequencing sequence gene • 6.6k views
ADD COMMENT
2
Entering edit mode
10.1 years ago
Brice Sarver ★ 3.8k

I've had this happen before in my analyses. This is often the result of an unflagged error in your input file - codeml generates a pairwise matrix, one of the initial steps, but doesn't proceed to the ML estimation of omega. I had this happen a while back when my sample names did not conform exactly to Phylip specifications and again when my samples started with a number. My first suggestion would be to shorten the names to 10 characters or less and see what happens. This is easily accomplished with sed:

cat YOURFILE.fa | sed -e s/Updated_Gene_HN//g > YOURFILE.renamed.fa

This reads the file, finds every instance of "Updated_Gene_HN" and replaces it with nothing, globally, then writes the result to a new file. This can be iterated easily in bash, too, if you need it.

I also needed to do some single-rate (i.e., global) omega calculations a couple of weeks ago. I'm pasting my control file below, though it looks like yours is fine. This file executed over 15,000 times, so I'm confident it works :). I've changed the names of the seqfile and outfile; I also provided my own tree.

seqfile = FASTAFILENAME * sequence data filename
     treefile = tree.phy  * tree structure file name
      outfile = FASTAFILENAME.codeml          * main result file name

        noisy = 9  * 0,1,2,3,9: how much rubbish on the screen
      verbose = 1  * 0: concise; 1: detailed, 2: too much
      runmode = 0  * 0: user tree;  1: semi-automatic;  2: automatic
                   * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise

      seqtype = 1  * 1:codons; 2:AAs; 3:codons-->AAs
    CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
*        ndata = 1
        clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
       aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
   aaRatefile = dat/jones.dat  * only used for aa seqs with model=empirical(_F)
                   * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own

        model = 0
                   * models for codons:
                       * 0:one, 1:b, 2:2 or more dN/dS ratios for branches
                   * models for AAs or codon-translated AAs:
                       * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F
                       * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)

      NSsites = 0  * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;
                   * 5:gamma;6:2gamma;7:beta;8:beta&w;9:betaγ
                   * 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;
                   * 13:3normal>0

        icode = 0  * 0:universal code; 1:mammalian mt; 2-10:see below
        Mgene = 0
                   * codon: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff
                   * AA: 0:rates, 1:separate

    fix_kappa = 0  * 1: kappa fixed, 0: kappa to be estimated
        kappa = 2  * initial or fixed kappa
    fix_omega = 0  * 1: omega or omega_1 fixed, 0: estimate
        omega = .5 * initial or fixed omega, for codons or codon-based AAs

    fix_alpha = 1  * 0: estimate gamma shape parameter; 1: fix it at alpha
        alpha = 0 * initial or fixed alpha, 0:infinity (constant rate)
       Malpha = 0  * different alphas for genes
        ncatG = 8  * # of categories in dG of NSsites models

        getSE = 1  * 0: don't want them, 1: want S.E.s of estimates
 RateAncestor = 0  * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)

   Small_Diff = .5e-6
    cleandata = 0  * remove sites with ambiguity data (1:yes, 0:no)?
*  fix_blength = -1  * 0: ignore, -1: random, 1: initial, 2: fixed
        method = 0   * 0: simultaneous; 1: one branch at a time

* Genetic codes: 0:universal, 1:mammalian mt., 2:yeast mt., 3:mold mt.,
* 4: invertebrate mt., 5: ciliate nuclear, 6: echinoderm mt.,
* 7: euplotid mt., 8: alternative yeast nu. 9: ascidian mt.,
* 10: blepharisma nu.
* These codes correspond to transl_table 1 to 11 of GENEBANK.

I hope this resolves your issues. Please respond if you need more help - I use codeml all the time.

ADD COMMENT
0
Entering edit mode

Hi Brice, thank you so much for your response.I tried implementing your code after shortening the name length for each sequence. After running the program, the Terminal tells me the following (which is what it said before changing the code and names, but I forgot to include in my original post):

Seq #1 (CI_7_D1) is missing in the tree

This is bizarre to me, as my tree contains the CI_7_D1 and its branch length when I open it in a text editor.

Overall, I am still getting the same output as before with no summary statistics. That being said, I'm wondering if the problem may lie with my input files. I've been constructing/aligning my sequences in MegAlign and exporting as simple fasta files. I'm constructing the tree in MEGA6, which only allows an export file type of .nwk. I was under the impression that a .nwk was equivalent to a .tree file. Could any of this potentially explain my issue?

ADD REPLY

Login before adding your answer.

Traffic: 1136 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6