Biological Databases Geographical Distribution
1
0
Entering edit mode
7.2 years ago

Hi,

I am interested to know how biological databases have been distributed worldwide? I think most of them are located in US and Europe. Am I right? Any other important country involved?

Biological databses geographical distribution • 2.0k views
ADD COMMENT
3
Entering edit mode
7.2 years ago

If it helps, using my tools http://lindenb.github.io/jvarkit/XsltStream.html and http://lindenb.github.io/jvarkit/PubmedDump.html I've extracted the affiliation of the first author of NAR database issue 2015.

ADD COMMENT
0
Entering edit mode

@Pierre, I am using your xsltstream to parse a xml i have downloaded using ncbi eutils. I have modified the above xls file(biostar270498.xsl) to get my desired output (Title and Abstract text). It works fine but after certain entires(~500) it throws a error. Can you please see if i have done anything wrong in the xsl or while using your tool.

Usage: cat ~/Downloads/test_e_renal_kidney.xml | java -jar dist/xsltstream.jar -t ~/Downloads/test_e_kid_renal.xsl -n PubmedArticle

xsl file:

   
<xsl:stylesheet xmlns:xsl="&lt;a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform' version='1.0' >   
<xsl:output method="text" encoding="UTF-8"/>   
<xsl:output method="text"/>  
<xsl:template match="/">  
<xsl:apply-templates select="PubmedArticle"/>  
</xsl:template>  
<xsl:template match="PubmedArticle">  
<xsl:apply-templates select="MedlineCitation/Article/Abstract/AbstractText"/>  
<xsl:text>  
</xsl:text>  
</xsl:template>  
</xsl:stylesheet>  

The error i am getting after ~500 so output:

[SEVERE][XsltStream]ParseError at [row,col]:[98160,6]  
Message: The processing instruction target matching "[xX][mM][lL]" is not allowed.  
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[98160,6]  
Message: The processing instruction target matching "[xX][mM][lL]" is not allowed.  
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:596)  
    at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83)  
    at com.github.lindenb.jvarkit.tools.misc.XsltStream.doWork(XsltStream.java:590)  
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMain(Launcher.java:763)  
    at com.github.lindenb.jvarkit.util.jcommander.Launcher.instanceMainWithExit(Launcher.java:926)  
    at com.github.lindenb.jvarkit.tools.misc.XsltStream.main(XsltStream.java:627)  
[INFO][Launcher]xsltstream Exited with failure (-1)  

PS i am still working to get abstract using the above xls

Thank you for your time and thank you for your tool.

ADD REPLY
0
Entering edit mode

what is the output of

xmllint --stream --noout ~/Downloads/test_e_renal_kidney.xml

?

ADD REPLY
0
Entering edit mode

@Pierre Thank you for your response,

Output is:

/home/dell/Downloads/test_e_renal_kidney.xml:98160: parser error : XML declaration allowed only at the start of the document  
<?xml version="1.0" ?>  
     ^  
/home/dell/Downloads/test_e_renal_kidney.xml : failed to parse  

Now i see what is the problem, its in the xml file, it has version line multiple times.

grep -c '?xml version=' ~/Downloads/test_e_renal_kidney.xml 1426

Any way i can parse this xml file to get title and abstract becase it very large file(7.9gb)

Thank you for your time

ADD REPLY

Login before adding your answer.

Traffic: 2831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6