Get fastq/sra from ArrayExpress and/or GEO programmatically for specific organism/experiment type
0
0
Entering edit mode
7.9 years ago
rioualen ▴ 750

Hello,

I would like to get all the sequencing data for a specific organism and/or experiment type from ArrayExpress. I looked into REST queries here, and built the following request:

https://www.ebi.ac.uk/arrayexpress/xml/v2/experiments?query="Escherichia+coli+K-12"AND"ChIP-seq"

If I get the accession number from each experiment, I can get a table summarizing the samples:

http://www.ebi.ac.uk/arrayexpress/files/<accession>/<accession>.sdrf.txt

However, the fields don't have fixed names. I need to get either the SRR and SRX identifiers, or the ERR one, in order to reach the SRA files or fastq files:

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX189/SRX189776/SRR576936/SRR576936.sra
ftp.sra.ebi.ac.uk/vol1/fastq/SRR576/SRR576936/SRR576936.fastq.gz

I would also like to do it from GEO, but then I need the GSE & GSM identifiers from the experiments, and I can't find them reliably either. This page seems useful but it doesn't say how to construct a query from scratch.

Overall, I'm completely lost by all the different types of identifiers and their connections...

arrayexpress fastq sra geo • 2.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 2884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6