Pubchem Database Into Mysql
5
5
Entering edit mode
13.0 years ago

Hello to everybody,

i hope this is the right forum to ask this question.

I want to download the pubchem substance database and put all informations into an mysql database. Is this possible and if how?

Second question is then: Is there an script which automatically update the database?

I didn't found anything about this question.

With best regards, Jochen Schreiber

mysql database • 8.4k views
ADD COMMENT
1
Entering edit mode
13.0 years ago
Pascal ★ 1.5k

Have a look to moldb5 it shows at least how to download SDF files from pubchem and import it into a MySQL DB.

ADD COMMENT
1
Entering edit mode
13.0 years ago

There is a XML schema (XSD) for the XML files of pubchem; ftp://ftp.ncbi.nih.gov/pubchem/specifications/pug.xsd

you could generate the tables and import the data with a "XSD to SQL" converter. see http://stackoverflow.com/questions/138575/how-can-i-create-database-tables-from-xsd-files

ADD COMMENT
0
Entering edit mode

I want to download SDF files for a list (.xl) of compounds automatically !! is there any python or R script?

ADD REPLY
1
Entering edit mode
13.0 years ago

The question is of course why you'd want to do that. As mentioned in your own question, updates are a constant hassle.

There are a couple of interfaces available hiding the complexities of the PUG and EUtils gateways into PubChem, so you can work locally with the current PubChem data as if it were a regular file or local database. That is much more convenient (I am guessing that your queries are not top secret...)

ADD COMMENT
0
Entering edit mode

Could you provide some additional information and links to some of these interfaces you mention?

ADD REPLY
0
Entering edit mode

I havejust sent an email with a PowerPoint presentation

ADD REPLY
0
Entering edit mode

But the email address in your profile at genome.wustl.edu bouces. How can I reach you?

ADD REPLY
0
Entering edit mode
13.0 years ago
Yogesh Pandit ▴ 520

You an download all the SDF files for the latest PubChem Substance release using any FTP client from

ftp://ftp.ncbi.nlm.nih.gov/pubchem/Substance/CURRENT-Full/SDF/

Then you can using ChemAxon's JChem manager to simply import all the SDFs into a MySQL database.

http://www.chemaxon.com/jchem/doc/admin/

ADD COMMENT
0
Entering edit mode

ChemAxon is commercial software, and with their academic license you are not allowed to create "shared databases"...

ADD REPLY
0
Entering edit mode
8.6 years ago
ostrokach ▴ 350

There is only one way to import large amounts of data into a standard database (i.e. MySQL, PostgreSQL, etc.):

  • process the data to create CSV files that your database can understand
  • use the LOAD DATA LOCAL INFILE ... command (or equivalent) to load those files into the database

If your data comes as XML (:o), you have to process that data using an XML library like lxml in Python, and create CSV files that contain all the information that you need. "XSD to SQL" converters don't work with complex schema that most XML files contain, and XML databases (e.g. BaseX, eXist) are immature and have limits on the size of the files that you can import.

The same applies to SDF files.

ADD COMMENT

Login before adding your answer.

Traffic: 2196 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6