Question

How do you run codeml for two genes in one file but with 1 species tree?

0

Entering edit mode

6.4 years ago

DNAngel ▴ 250

I want to test selection acting on one gene (set as my foreground) compared to a duplicate of the same gene (leaving it as background) but I'm not sure how to set up my files. I have alignments for the two genes, but this means I have to combine them into one alignment file. This means that all my species will have duplicated names. if I change the duplicated names then my one species tree won't work because half the sequences' names won't be detected.

I tried producing a tree with the combined MSA and it ended up looking like a smaller tree (for one gene)embedded in the middle of another tree for the second gene. I have no idea how I could unroot it like this...

Concatenating the dataset also does not make sense to me since I'd want to specifically make two partitions, one for each gene. Or is there another way to test for selection strength for duplicated genes?

PAML codeml • 2.1k views

ADD COMMENT • link updated 6.4 years ago by shelkmike ★ 1.6k • written 6.4 years ago by DNAngel ▴ 250

0

Entering edit mode

Could you please clarify what "my one species tree won't work" means?

ADD REPLY • link 6.4 years ago by shelkmike ★ 1.6k

0

Entering edit mode

In PAML you typically run one alignment file and one treefile where the species names in the alignment have to exist and match exactly to the names in the treefile. So if I want to double my alignment by adding a gene, I'd have to attach something to the second set of species names like "sp1-2", "sp2-2" otherwise duplicated names can cause a problem (at least it did when I ran it in different models on datamonkey). Now...by changing the second set of names, they no longer "exist" in the treefile. So it's just this weird loop of problems. Simply concatenating the second gene alignment to my first gene alignment doesn't help since I wanted to label the second gene as a foreground.

ADD REPLY • link 6.4 years ago by DNAngel ▴ 250

score 0 · Answer 1 · 2019-02-28

If I have understood correctly, you are speaking about two orthogroups, which are descendants of a pair of paralogs in the last common ancestor of the studied species. Each species has one gene from each of the two orthogroups. The tree you provide to PAML should not be the species phylogenetic tree, but the genes' tree. If you have species A, B, C which form a tree [I'm using the Newick notation here] (A,(B,C)), the tree should be ((A1,(B1,C1)),(A2,(B2,C2))). The multiple sequence alignment should include sequences from both orthogroups, and the sequences must have the same titles as in the file with the tree.