I am trying to download entire dataset for a bioproject using esearch and efetch from the Entrez Utilities.
My syntax is based on syntax posted by @Istvan Albert at C: How to download raw sequence data from GEO/SRA, which is
esearch -db sra -query PRJNA40075 | efetch --format runinfo | cut -d ',' -f 1 | grep SRR | head -5 | xargs fastq-dump -X 10 --split-files
For the BioProject PRJNA269201 I am interested in, slightly truncated syntax as shown below, creates 144 empty files as expected:
esearch -db sra -query PRJNA269201 | efetch --format runinfo | cut -d ',' -f 1 | grep SRR | xargs touch
However, when I try the full-length syntax, it behaves differently from what I expected under both scenarios 1 and 2 detailed below:
Scenario 1. On head-node of a cluster:
esearch -db sra -query PRJNA269201 | efetch --format runinfo | cut -d ',' -f 1 | grep SRR | head -2 | xargs fastq-dump --split-files
one file finished download, but it is 5.5G which is way larger than the 1.2GB I expected based on info at this link - is this difference because of file compression?! How can I download to a much more compressed version for both storage and downstream RNA-Seq analyses?
-rw-rw-r-- 1 aksrao aksrao 1.1G Jan 19 19:47 SRR1726554_1.fastq
-rw-rw-r-- 1 aksrao aksrao 5.5G Jan 19 19:44 SRR1726553_1.fastq
Scenario 2. When I try to submit this as a shell script, the STDERR stream (SLURM queue management on UBUNTU cluster) captures the following error message:
2019-01-20T02:28:55 fastq-dump.2.8.2 err: param empty while validating
argument list - expected accession
This same problem was reported on the original post by user @ bandanaschapagain, but it may not have been answered and resolved, hence I am posting this afresh. Could someone please help me? Thank you!
I would always avoid
fastq-dump
to directly load files from the SRA as it tends to be unstable. Better download the SRA files to disk withprefetch
and then usefastq-dump
on them, given that the data are not backed-up at the ENA in fastq format directly.No need for
wait
. GNU Parallel does that for you.