I need .obo file with InterPro annotations. From obofoundry I know that that file was available some time ago. Now link to it is dead.
I've seed that OBO-Edit has ability to create .obo files even from information extracted from articles article here but haven't found info. how transform text file to obo.
I wonder if someone knows how to transform InterPro ParentChildTreeFile, representative part of it:
IPR000971::Globin, subset::
--IPR001032::Leghaemoglobin::
--IPR002335::Myoglobin::
--IPR002336::Erythrocruorin::
----IPR011367::Haemoglobin, polymeric::
--IPR002337::Haemoglobin, beta::
--IPR002338::Haemoglobin, alpha::
----IPR002339::Haemoglobin, pi::
------IPR018331::Haemoglobin alpha chain::
to either .obo type file or just to two-column txt file where terms (only IPR ids) are paired to higher level term. e.g. for above excerpt:
IPR001032 isa: IPR000971
IPR002335 isa: IPR000971
IPR002336 isa: IPR000971
IPR011367 isa: IPR002336
IPR002337 isa: IPR000971
IPR002338 isa: IPR000971
IPR002339 isa: IPR002338
IPR018331 isa: IPR002339
I've tried to make such a file by transforming ParentChild... to file with each full branch in one line, extracting lines containing ---- swapping fields so last become first and first become second and adding "is_a:" between. The same for lines without ---- but with --. It looks ok (after filtering-out lines where 1st field equals 2nd) but then I realized that there are lower level terms (i.e. starting with more hyphens, up to seven), so it would require further divison of file and repeated grep and awk commands. All in all it is very error-prone procedure.
I wrote to EBI with question about interpro.obo file, if I have any info I'll share it.
Thanks, it works. I just have to install "gcj-4.5-jdk", without it there was error: "Syntax error, parameterized types are only available if source level is 1.5" pointing to
<String>
in eighth line.I use Oracle/SUN java compiler. The current release is 7.0, I suppose that your java compiler was very old.