Replace tip names in a newick file
1
0
Entering edit mode
9.7 years ago
tlorin ▴ 370

Hello everyone :)

I have a collection of newick-formatted files containing gene IDs:

((gene1:1,gene2:1)100:1,gene3:1)100;
((gene4:1,gene5:1)100:1,gene6:1)100;

I have a list of equivalence between gene ID and species name:

speciesA=(gene1,gene4)
speciesB=(gene2,gene5)
speciesC=(gene3,gene6)

I would like to get the following output (to use Duptree software later):

((speciesA:1,speciesB:1)100:1,speciesC:1)100;
((speciesA:1,speciesB:1)100:1,speciesC:1)100;

Any idea of how I could proceed? Ideally in bash would be awesome :)

Thank you! :)

newick • 2.8k views
ADD COMMENT
4
Entering edit mode
9.7 years ago

In Python, the solution may look like this:

import re
import sys

d = {}

fh = open(sys.argv[1])
for line in fh:
    species = line.split('=')[0]
    genes = re.findall('=\(([^\)]+)', line)[0].split(',')
    d[species] = genes
fh.close()

fh = open(sys.argv[2])
tree = fh.read()
for species in d:
    for gene in d[species]:
        tree = tree.replace(gene, species)

oh = open(sys.argv[2]+'.out', 'w')
oh.write(tree)
oh.close()

Assuming you have saved this code as gene2species.py and the newick tree is in the tree.newick file and the list of genes and species is in the genes.txt file, run the script as follows:

python gene2species.py genes.txt tree.newick

The translated newick tree will be saved in tree.newick.out:

((speciesA:1,speciesB:1)100:1,speciesC:1)100;
((speciesA:1,speciesB:1)100:1,speciesC:1)100;
ADD COMMENT
0
Entering edit mode

Thanks so much! :)

ADD REPLY

Login before adding your answer.

Traffic: 1583 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6