Extract Information From Tree File
8
0
Entering edit mode
11.9 years ago
macmath ▴ 170

Dear Colleagues,

Need to fetch from this tree accession(YP_0028009) its associated bootstrap value and closest organism accession To generate a table of such accession from different files List of accession for which the bootstrap needs to be searched is stored in a file which searches in this tree file. Your valuable suggestion can indeed lead to my result.

Tree file:

(((((YP_0038120:0.3990275855,ZP_0509506:0.4708403113)16:0.0827617173,((((YP_0028009:0.0983484613,ABK15514:0.1322882846)100:0.2658296649,ZP_1077782:0.6890206276)54:0.1249721135,ZP_0111387:0.3178024438)33:0.0462544520,((YP_433139:0.3823668263,YP_958213:0.3906268190)69:0.1354198970,ZP_0130727:0.3058092646)36:0.1005193390)18:0.0211735847)22:0.0605114450,(((YP_0030742:0.1830349875,YP_526727:0.1042282207)95:0.1357964298,ZP_1013419:0.3702960354)78:0.0718566437,ZP_0950340:0.3886711834)70:0.1073072978)71:0.1546311852,((ZP_0998998:0.0955996985,ZP_1049308:0.0615438616)99:0.3920705702,(YP_0021568:0.3911667388,((((YP_0029865:0.0841201303,AFB73763:0.0283518412)17:0.0000014632,EHB20458:0.0687641880)59:0.0542412622,ZP_0296170:0.3755247695)92:0.1582934488,YP_0050935:0.3509942991)82:0.1264119120)58:0.0310116086)98:0.2431064013)100:1.1356345531,ZP_1115340:0.1260949059,ZP_0206258:0.1548044869);

Output file

Accession closest accession Bootstrap

YP_0028009 ABK15514 100

Looking forward for your suggestion

tree • 3.7k views
ADD COMMENT
5
Entering edit mode
11.9 years ago
Michael 55k

This is called Newick tree format. Search for an existing parser library in your programming language of choice. For Perl, try the Bio::Phylo package, try to specify your use-case precisely and try to implement it using a small example data set.

http://search.cpan.org/~rvosa/Bio-Phylo-0.52/lib/Bio/Phylo/Manual.pod

ADD COMMENT
3
Entering edit mode
11.9 years ago
Leszek 4.2k

If you are familiar with Python, I would recommend ete2 for any tree-like structure handling.
Here you have some examples.

ADD COMMENT
2
Entering edit mode
11.9 years ago
SES 8.6k

I gave Michael's answer an upvote because I use Bio::Phylo a lot in my work, but I have to suggest you look at the BioPerl HOWTO also. I think the entry level for using the BioPerl tree methods is a bit lower than with Bio::Phylo and for most tasks, Bio::Tree* methods and some scripting will suffice. I like the C implementations recommended here (I'll refer to them for my own work!), but it may be easier to just use the existing code in one of the Bio* packages for your task.

ADD COMMENT
1
Entering edit mode
11.9 years ago

If you like using C, the klib API includes a Newick tree format parser in the knhx.c source code that reads a string into a knhx1_t struct.

The main() function in the knhx.c source demonstrates how it can be used with your string, as far as reading it in and walking through parent and children nodes. You could adjust this for your use case, as needed.

ADD COMMENT
0
Entering edit mode
11.9 years ago
macmath ▴ 170

Thank you Michael Dondrup for your suggestion

ADD COMMENT
2
Entering edit mode

It's nice to thank. But, as it's not an answer, please do it as a comment. You can also upvote if you like an answer.

ADD REPLY
0
Entering edit mode
11.9 years ago
nonish5 ▴ 40

MATLAB's function phytreeread() (http://www.mathworks.com/help/bioinfo/ref/phytreeread.html) also reads newick format. and getgenbank()\getgenprot() can be used to fetch information for genes\proteins accessions accordingly.

ADD COMMENT
0
Entering edit mode
11.9 years ago
lh3 33k

An exercise for myself... The following C program is based on knhx.c. To use it:

gcc -g -O2 -Wall get-nei.c -o get-nei
./get-nei '((a,b)10,c)20' a

The output is a 10 b. It actually does more than your needs: if the sibling is a tree, you will get the entire subtree.

EDIT: put the source code in the gist:4333695.

ADD COMMENT
0
Entering edit mode
11.9 years ago
macmath ▴ 170

Sincere thanks for all your suggestions

ADD COMMENT

Login before adding your answer.

Traffic: 1674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6