Is there a method (relatively automatic) for quantifying the datasets in ArrayExpress or Gene Expression Omnibus, for example?
I would like to make some statistics on the number of datasets per platform,tissue type,species etc
Otherwise, which would be the best way to do it?
Example
Array Express:
http://www.ebi.ac.uk/arrayexpress/browse.html
I would like to filter experiments from this type of listing and quantify the data sets. 4 datasets chip-seq, mus musculus, 3 datasets rna seq homo sapiens, etc
@Alastair Kerr -I've had a brief look, I will look more carefully. However it seems like you can search for individual genes..I would like to have some statistics on the entire datasets containing expression data stored in the public repositories..like 4 datasets expression data, microarray, small non coding rnas; 2 datasets expression data, microarray, small nc rnas , mus musculus etc
*small nc rnas- homo sapiens in the first example
If a measure of the database and not the gene, then yes. Let me edit the answer
@Alastair Kerr- it is just perfect, thanks.I couldn't find anything similar for ArrayExpress though..do you know anything about this?
@Alastair Kerr- Also, I am assuming that exporting to a csv file and then processing is the only possible way to do this, right?
As far as I know as I have never seen summary data for arrayexpress. I would contact their helpdesk (miamexpress@ebi.ac.uk) first as they may have programmatic access.
@Alastair Kerr- thanks
There is programmatic access to ArrayExpress - http://www.ebi.ac.uk/fg/doc/help/programmatic_access.html. However, it isn't great. They told me that they've exposed their internal API for public use, but it is far from production-ready.