Hello,
I would like to get all the sequencing data for a specific organism and/or experiment type from ArrayExpress. I looked into REST queries here, and built the following request:
https://www.ebi.ac.uk/arrayexpress/xml/v2/experiments?query="Escherichia+coli+K-12"AND"ChIP-seq"
If I get the accession number from each experiment, I can get a table summarizing the samples:
http://www.ebi.ac.uk/arrayexpress/files/<accession>/<accession>.sdrf.txt
However, the fields don't have fixed names. I need to get either the SRR and SRX identifiers, or the ERR one, in order to reach the SRA files or fastq files:
ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX189/SRX189776/SRR576936/SRR576936.sra
ftp.sra.ebi.ac.uk/vol1/fastq/SRR576/SRR576936/SRR576936.fastq.gz
I would also like to do it from GEO, but then I need the GSE & GSM identifiers from the experiments, and I can't find them reliably either. This page seems useful but it doesn't say how to construct a query from scratch.
Overall, I'm completely lost by all the different types of identifiers and their connections...