How to extract specific columns from Drugbank xml file
2
0
Entering edit mode
6.5 years ago
vasilislenis ▴ 160

Hello everyone,

I would like to generate a tab-separated file from DrugBank that will include the following tags:

<drugbank-id> <name> <gene-name> <action>

I have tried to use the xmlstarlet tool by following Lyco's instructions from here:

How To Convert Xml Into A Decent Parseable Format?

but I don't have any result. xmlstarlet doesn't return anything as result (I believe that the xml structure is a little more complicated than his example and I'm not getting any kind of error). I have also tried to change the namespace that drugBank uses but nothing changed.

I have also tried to use the csv files from DrugBank external links which is fine for the name, the id of the drugs and the protein name but they don't include the "action" information.

So, any help would be greatly appreciated...

Thank you very much in advance, Vasilis.

Drugbank xml • 4.5k views
ADD COMMENT
3
Entering edit mode
6.5 years ago

Samuel Lampa wrote something in march

I wrote my version using a streaming xslt tool

converting drugbank to TSV using XSLT and xsltstream http://lindenb.github.io/jvarkit/XsltStream.html

e.g:

java -jar dist/xsltstream.jar \
    -n '{http://www.drugbank.ca}drug' \
    -t drugbank2tsv.xsl \
    /path/to/full_database.xml
   
Lepirudin	approved		CHEMBL1201666		46507011
Cetuximab	approved		CHEMBL1201577		46507042
Dornase alfa	approved		CHEMBL1201431		46507792
Denileukin diftitox	approved->investigational		CHEMBL1201550		46506950
Etanercept	approved->investigational		CHEMBL1201572		46506732
Bivalirudin	approved->investigational	OIRCOABEOLEUMC-GEJPAHFPSA-N	CHEMBL2103749	16129704	46507415
Leuprolide	approved->investigational		CHEMBL1201199		46507635
Peginterferon alfa-2a	approved->investigational		CHEMBL1201560		46504860
Alteplase	approved		CHEMBL1201593		46507035
Sermorelin	approved->withdrawn				46507399
view raw README.md hosted with ❤ by GitHub
<?xml version='1.0' encoding="UTF-8" ?>
<xsl:stylesheet xmlns:d="http://www.drugbank.ca" xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>
<xsl:output method="text"/>
<xsl:template match="d:drugbank">
<xsl:apply-templates select="d:drug"/>
</xsl:template>
<xsl:template match="d:drug">
<xsl:value-of select="d:name/text()"/>
<xsl:text> </xsl:text>
<xsl:for-each select="d:groups/d:group">
<xsl:if test='position()>1'>-&gt;</xsl:if>
<xsl:value-of select="./text()"/>
</xsl:for-each>
<xsl:text> </xsl:text>
<xsl:for-each select="d:calculated-properties/d:property[d:kind/text()='InChIKey']/d:value">
<xsl:if test='position()>1'> </xsl:if>
<xsl:value-of select="./text()"/>
</xsl:for-each>
<xsl:text> </xsl:text>
<xsl:for-each select="d:external-identifiers/d:external-identifier[d:resource/text()='ChEMBL']/d:identifier">
<xsl:if test='position()>1'> </xsl:if>
<xsl:value-of select="./text()"/>
</xsl:for-each>
<xsl:text> </xsl:text>
<xsl:for-each select="d:external-identifiers/d:external-identifier[d:resource/text()='PubChem Compound']/d:identifier">
<xsl:if test='position()>1'> </xsl:if>
<xsl:value-of select="./text()"/>
</xsl:for-each>
<xsl:text> </xsl:text>
<xsl:for-each select="d:external-identifiers/d:external-identifier[d:resource/text()='PubChem Substance']/d:identifier">
<xsl:if test='position()>1'> </xsl:if>
<xsl:value-of select="./text()"/>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>

ADD COMMENT
0
Entering edit mode

Many thanks, Pierre for your help! I found Samuel's approach a little bit more complicated since you have to install GO language, so followed your approach by tweaking a little bit the xslt template from your example. I would really appreciate it if you could take a look at it and tell me your thoughts cause I am not so familiar with XML.

Thank you very much in advance, Vasilis.


<xsl:stylesheet xmlns:d="&lt;a href=" http:="" www.drugbank.ca"="" rel="nofollow">http://www.drugbank.ca" xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'>
<xsl:output method="text"/>

<xsl:template match="d:drugbank">
<xsl:apply-templates select="d:drug"/>
</xsl:template>

<xsl:template match="d:drug">
<xsl:value-of select="d:name/text()"/>
<xsl:text>      </xsl:text>

<xsl:for-each select="d:targets/d:target/d:polypeptide/d:gene-name">
         <xsl:value-of select=" concat(./text(),',')"/>
</xsl:for-each>
<xsl:text>      </xsl:text>
<xsl:for-each select="d:targets/d:target/d:actions/d:action">
        <xsl:value-of select=" concat(./text(),',')"/>
</xsl:for-each>
<xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>
ADD REPLY
0
Entering edit mode
5.7 years ago
mohfcis ▴ 20

Hi, You can use dbparser package https://github.com/Dainanahan/dbparser, it is designed to parse DrugBank database and return R dataframes

ADD COMMENT

Login before adding your answer.

Traffic: 1308 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6