How to extract specific columns from Drugbank xml file
2
0
Entering edit mode
6.1 years ago
vasilislenis ▴ 160

Hello everyone,

I would like to generate a tab-separated file from DrugBank that will include the following tags:

<drugbank-id> <name> <gene-name> <action>

I have tried to use the xmlstarlet tool by following Lyco's instructions from here:

How To Convert Xml Into A Decent Parseable Format?

but I don't have any result. xmlstarlet doesn't return anything as result (I believe that the xml structure is a little more complicated than his example and I'm not getting any kind of error). I have also tried to change the namespace that drugBank uses but nothing changed.

I have also tried to use the csv files from DrugBank external links which is fine for the name, the id of the drugs and the protein name but they don't include the "action" information.

So, any help would be greatly appreciated...

Thank you very much in advance, Vasilis.

Drugbank xml • 4.3k views
ADD COMMENT
3
Entering edit mode
6.1 years ago

Samuel Lampa wrote something in march

I wrote my version using a streaming xslt tool

ADD COMMENT
0
Entering edit mode

Many thanks, Pierre for your help! I found Samuel's approach a little bit more complicated since you have to install GO language, so followed your approach by tweaking a little bit the xslt template from your example. I would really appreciate it if you could take a look at it and tell me your thoughts cause I am not so familiar with XML.

Thank you very much in advance, Vasilis.


<xsl:stylesheet xmlns:d="&lt;a href=" http:="" www.drugbank.ca"="" rel="nofollow">http://www.drugbank.ca" xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>
<xsl:output method="text"/>

<xsl:template match="d:drugbank">
<xsl:apply-templates select="d:drug"/>
</xsl:template>

<xsl:template match="d:drug">
<xsl:value-of select="d:name/text()"/>
<xsl:text>      </xsl:text>

<xsl:for-each select="d:targets/d:target/d:polypeptide/d:gene-name">
         <xsl:value-of select=" concat(./text(),',')"/>
</xsl:for-each>
<xsl:text>      </xsl:text>
<xsl:for-each select="d:targets/d:target/d:actions/d:action">
        <xsl:value-of select=" concat(./text(),',')"/>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>
ADD REPLY
0
Entering edit mode
5.3 years ago
mohfcis ▴ 20

Hi, You can use dbparser package https://github.com/Dainanahan/dbparser, it is designed to parse DrugBank database and return R dataframes

ADD COMMENT

Login before adding your answer.

Traffic: 2539 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6