Remove Tags In Sbml File
5
2
Entering edit mode
13.6 years ago

I'm trying to remove <annotation> tags in an SBML file and combine start- and end tags of the surrounding element if it empty as a consequence of this removal.


<sbml xmlns="&lt;a href=" http:="" www.sbml.org="" sbml="" level2"="" rel="nofollow">http://www.sbml.org/sbml/level2" level="2" metaid="metaid_0000001" version="1">
<listOfSpecies>
    <species compartment="compartment" id="II_f" initialConcentration="1400" metaid="metaid_0000115" name="Fluid phase Factor II">
        <annotation><content>something</content></annotation>
    </species>
</listOfSpecies>
</sbml>

should transform to:


<sbml xmlns="&lt;a href=" http:="" www.sbml.org="" sbml="" level2"="" rel="nofollow">http://www.sbml.org/sbml/level2" level="2" metaid="metaid_0000001" version="1">
<listOfSpecies>
    <species compartment="compartment" id="II_f" initialConcentration="1400" metaid="metaid_0000115" name="Fluid phase Factor II"/>
</listOfSpecies>
</sbml>

Note that the <annotation> tag can be in listOfReactions/reaction as well. I made a rather basic attempt at an XSLT stylesheet but so far the namespaces confuse me ;-)

Any suggestions on how to do this?

edit: removed RDF namespace as I did not add I properly before

xml • 3.9k views
ADD COMMENT
3
Entering edit mode
13.6 years ago

Solution using Python and libSBML:

from libsbml import *

doc = SBMLReader().readSBMLFromFile("filename.xml")
model = doc.getModel()

model.unsetAnnotation()
for species in model.getListOfSpecies():
    species.unsetAnnotation()
for reaction in model.getListOfReactions():
    reaction.unsetAnnotation()
for param in model.getListOfParameters():
    param.unsetAnnotation()

writeSBMLToFile(doc, "new_filename.xml")
ADD COMMENT
2
Entering edit mode
13.6 years ago

First your xml document is missing a namespace declaration:

<sbml xmlns="&lt;a href=" http:="" www.sbml.org="" sbml="" level2"="" rel="nofollow">http://www.sbml.org/sbml/level2"
   xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   level="2" metaid="metaid_0000001" version="1"
   >
...

Second, as far as I understand, you just want to copy the tag 'species' in the sbml namespace with all its attributes but you want to skip the child nodes.


<xsl:stylesheet xmlns:xsl="&lt;a href="http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
    xmlns:s="http://www.sbml.org/sbml/level2"
    xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

        version='1.0'
        >
<xsl:output method="xml"/>

<xsl:template match="@*|node()">
   <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="s:species|s:parameter|s:reaction">
<xsl:element name="{local-name()}" namespace="&lt;a href=" http:="" www.sbml.org="" sbml="" level2"="" rel="nofollow">http://www.sbml.org/sbml/level2">
<xsl:apply-templates select="@*"/>
</xsl:element>
</xsl:template>

</xsl:stylesheet>

xsltproc ~/file.xsl file.xml


<sbml xmlns="&lt;a href=" http:="" www.sbml.org="" sbml="" level2"="" rel="nofollow">http://www.sbml.org/sbml/level2" xmlns:RDF="http://www.w3.org/1999/02/22-rdf-syntax-ns#" level="2" metaid="metaid_0000001" version="1">
<listOfSpecies>
    <species compartment="compartment" id="II_f" initialConcentration="1400" metaid="metaid_0000115" name="Fluid phase Factor II"/>
</listOfSpecies>
</sbml>
ADD COMMENT
0
Entering edit mode

Thanks for the answer. When I try "xalan your.xslt my.xml" (xsltproc fails with a library error) no annotations are deleted. Another thing is that the [?] tag can occur inside [?], [?], and [?] tags, is there a way to match all of them?

ADD REPLY
0
Entering edit mode

Thanks for the answer. When I try "xalan your.xslt my.xml" (xsltproc fails with a library error) no annotations are deleted. Another thing is that the [?] tag can occur inside [?], [?], and [?] tags (and there can be other nested tags that should not be deleted), is there an easy way to account for this as well?

ADD REPLY
0
Entering edit mode

Michael, for the second problem I've updated the stylesheet according to your needs.

ADD REPLY
0
Entering edit mode

For xalan, as far as i can see, the command line requires some options: http://xml.apache.org/xalan-j/commandline.html ( -IN, -XSL , -OUT ...)

ADD REPLY
0
Entering edit mode
$ xalan --help

prints

Usage: Xalan [options] source stylesheet

So I think my usage should be fine. Gonne give it a closer look when I'm on my own machine again, thanks.

ADD REPLY
1
Entering edit mode
13.6 years ago

Here's a generic XSLT solution for removing XML elements. The stylesheet below assumes you want to delete sbml:annotation elements, but that can be easily changed.

<xsl:stylesheet version="1.0" xmlns:xsl="&lt;a href=" http:="" www.w3.org="" 1999="" XSL="" Transform"="" rel="nofollow">http://www.w3.org/1999/XSL/Transform"
                xmlns:sbml="http://www.sbml.org/sbml/level2">

  <xsl:output method="xml" indent="yes"/>

  
  <xsl:strip-space elements="*"/>

  
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  
  <xsl:template match="sbml:annotation"/>

</xsl:stylesheet>

This is based on the identity transformation which copies everything as it is. This is a useful starting point for stylesheets where you want the output to be almost like the input and to only make a "small" change, e.g. removing something specific or adding something specific.

The identity transform is overridden by the empty template matching sbml:annotation, producing nothing to the output, therefore removing the elements.

The xsl:strip-space top-level element is used so that whitespace-only nodes get stripped the whole document, allowing the empty-element tag (combined start and end tag) to be used where an element would only contain whitespace after removing elements. If you want to preserve whitespace-only text nodes somewhere else in the document, you may want to use something more specific than * to specify from which elements whitespace-only nodes should be stripped.

ADD COMMENT
1
Entering edit mode
13.6 years ago
Heikki ▴ 360

A solution using perl and XML::Twig. I find XML::Twig usually easier to write and understand that XSLT (although the code above by Jukka is an example in clarity). Run as 'perl code.pl file.sbml > newfile.sbml':

use Modern::Perl;
use XML::Twig;

my $twig_handlers = {
    # remove a tag conditionally
    'annotation'     => sub { $_->delete if $_->text =~ /something/ },
    # output and free memory
    'listOfSpecies'  => sub { $_[0]->flush } 
};

my $twig = XML::Twig->new(
    TwigHandlers => $twig_handlers,
    KeepEncoding => 1,
    pretty_print => 'indented'
);

my $file = shift;
$twig->parsefile($file);
ADD COMMENT
0
Entering edit mode
13.6 years ago

Hi !

On Linux, using sed, it's quite easy :

sed /^.annotation.$/d file.xml > newfile.xml

Ludo

ADD COMMENT
1
Entering edit mode

It's about removing everything between the start and end tags, not only the tags ;)

ADD REPLY
0
Entering edit mode

Ok, there is a small error in my code. The correct command line is : sed /^.\<annotation\&gt;.< em="">$/d test > newfile.xml This will remove all the lines containing the tag "<annotation>". However, this won't work if there are several lines of annotation...

ADD REPLY

Login before adding your answer.

Traffic: 2168 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6