I'm trying to remove <annotation> tags in an SBML file and combine start- and end tags of the surrounding element if it empty as a consequence of this removal.
Note that the <annotation> tag can be in listOfReactions/reaction as well. I made a rather basic attempt at an XSLT stylesheet but so far the namespaces confuse me ;-)
Any suggestions on how to do this?
edit: removed RDF namespace as I did not add I properly before
from libsbml import *
doc = SBMLReader().readSBMLFromFile("filename.xml")
model = doc.getModel()
model.unsetAnnotation()
for species in model.getListOfSpecies():
species.unsetAnnotation()
for reaction in model.getListOfReactions():
reaction.unsetAnnotation()
for param in model.getListOfParameters():
param.unsetAnnotation()
writeSBMLToFile(doc, "new_filename.xml")
Second, as far as I understand, you just want to copy the tag 'species' in the sbml namespace with all its attributes but you want to skip the child nodes.
Thanks for the answer. When I try "xalan your.xslt my.xml" (xsltproc fails with a library error) no annotations are deleted. Another thing is that the [?] tag can occur inside [?], [?], and [?] tags, is there a way to match all of them?
Thanks for the answer. When I try "xalan your.xslt my.xml" (xsltproc fails with a library error) no annotations are deleted. Another thing is that the [?] tag can occur inside [?], [?], and [?] tags (and there can be other nested tags that should not be deleted), is there an easy way to account for this as well?
Here's a generic XSLT solution for removing XML elements. The stylesheet below assumes you want to delete sbml:annotation elements, but that can be easily changed.
This is based on the identity transformation which copies everything as it is. This is a useful starting point for stylesheets where you want the output to be almost like the input and to only make a "small" change, e.g. removing something specific or adding something specific.
The identity transform is overridden by the empty template matching sbml:annotation, producing nothing to the output, therefore removing the elements.
The xsl:strip-space top-level element is used so that whitespace-only nodes get stripped the whole document, allowing the empty-element tag (combined start and end tag) to be used where an element would only contain whitespace after removing elements. If you want to preserve whitespace-only text nodes somewhere else in the document, you may want to use something more specific than * to specify from which elements whitespace-only nodes should be stripped.
A solution using perl and XML::Twig. I find XML::Twig usually easier to write and understand that XSLT (although the code above by Jukka is an example in clarity). Run as 'perl code.pl file.sbml > newfile.sbml':
use Modern::Perl;
use XML::Twig;
my $twig_handlers = {
# remove a tag conditionally
'annotation' => sub { $_->delete if $_->text =~ /something/ },
# output and free memory
'listOfSpecies' => sub { $_[0]->flush }
};
my $twig = XML::Twig->new(
TwigHandlers => $twig_handlers,
KeepEncoding => 1,
pretty_print => 'indented'
);
my $file = shift;
$twig->parsefile($file);
Ok, there is a small error in my code. The correct command line is :
sed /^.\<annotation\>.< em="">$/d test > newfile.xml
This will remove all the lines containing the tag "<annotation>". However, this won't work if there are several lines of annotation...
Thanks for the answer. When I try "xalan your.xslt my.xml" (xsltproc fails with a library error) no annotations are deleted. Another thing is that the [?] tag can occur inside [?], [?], and [?] tags, is there a way to match all of them?
Thanks for the answer. When I try "xalan your.xslt my.xml" (xsltproc fails with a library error) no annotations are deleted. Another thing is that the [?] tag can occur inside [?], [?], and [?] tags (and there can be other nested tags that should not be deleted), is there an easy way to account for this as well?
Michael, for the second problem I've updated the stylesheet according to your needs.
For xalan, as far as i can see, the command line requires some options: http://xml.apache.org/xalan-j/commandline.html ( -IN, -XSL , -OUT ...)
prints
So I think my usage should be fine. Gonne give it a closer look when I'm on my own machine again, thanks.