How can I get the data from geoprofiles database parsed into some sane way? For example after the search, I get a result with a couple of ids. Let's say I want to download 64663643 (http://www.ncbi.nlm.nih.gov/geoprofiles/64663643). Specifically I'd like to get the GDS's summary from it.
But after doing the standard:
Bio.Entrez.read(Bio.Entrez.esummary(db='geoprofiles', id='64663643'))
I get DTD errors (missing tag ENTREZ_GENE_ID
). If I try without validation, I get a lot of data without a proper structure:
{u'DocumentSummarySet': ListElement([ListElement(['3682', '2896', 'fKTC', 'zFJA', 'Thiamine supplementation effect on non-insulin-dependent diabetes model: liver', 'Rattus norvegicus', '476602p1p1p1', 'Expression profiling by array', 'count', '46103', 'Gja7', 'gap junction membrane channel protein alpha 7', '', '', 'Rattus norvegicus gap junction channel protein connexin 45 mRNA, partial cds', 'AF536559.4', '', '', '', '', '476602p1;476604p1', '9;9', '5.231620', '346.305270', '', '22500', '0', '88', '30'], attributes={u'uid': u'64663643'})], attributes={u'status': u'OK'})}
What should I do differently to get a proper parsed result?
I don't have an answer to your question, but may I ask: if you want to retrieve data from a GDS, why don't you use the GDS ID (GDS3682) ? And maybe you may find this website and related SQLite DB useful : http://gbnci.abcc.ncifcrf.gov/geo/index.php. Julien
I'm going to do that, but first I need to get the GDS id from the geoprofiles entry.