Downloading all runs using fastq-dump
2
0
Entering edit mode
3.6 years ago
BDK_compbio ▴ 140

Is there a way to download all runs for SRA id like this https://www.ncbi.nlm.nih.gov/sra/?term=SRX724870 ? I manually searching the SRA on NCBI site and using fastq-dump for each one the runs. For example, I am running following three as fastq-dump -I --split-files SRX724870 gives errors.

fastq-dump -I --split-files SRR1602552
fastq-dump -I --split-files SRR1602553
fastq-dump -I --split-files SRR1602554

I have a list of SRA ids for which I am manually searching and running fastq-dump. It would be great of I can download all runs just using SRA id (e.g. SRX724870).

fastq-dump SRAtoolkit • 3.4k views
ADD COMMENT
0
Entering edit mode

Just enter the query at sra-explorer : find SRA and FastQ download URLs in a couple of clicks and get download links for fastq files right away.

ADD REPLY
0
Entering edit mode
3.6 years ago
Sukjun Kim ▴ 90

I remember that automatic expansion of container accessions is not currently available in sratoolkit.

Why don't you try this short bash script?

It automatically retrieves all SRA accessions from SRX identifier and downloads corresponding runs.

#!/bin/bash

srx_id=SRX724870

sra_ids=$(wget -qO- "http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=${srx_id}" | grep ${srx_id} | cut -f1 -d",")

for sra_id in "${sra_ids[@]}"; do
    fastq-dump "${sra_id}"
done
ADD COMMENT
0
Entering edit mode

It gives the following error

2021-06-10T05:07:51 fastq-dump.2.9.1 err: item not found while constructing within virtual database module - the path 'SRR1602552 SRR1602553 SRR1602554' cannot be opened as database or table

ADD REPLY
0
Entering edit mode

I think that the error has occurred because you wrote your code at the line 8 like this:

    fastq-dump "${sra_ids}"

It would have produced a command line below

$ fastq-dump "SRR1602552 SRR1602553 SRR1602554"

So you should fix the code using the variable ${sra_id} instead of using ${sra_ids}.

    fastq-dump "${sra_id}"

or it is also okay.

    fastq-dump ${sra_id}

It will produces a bunch of command lines as follows:

$ fastq-dump SRR1602552
$ fastq-dump SRR1602553
$ fastq-dump SRR1602554

I hope you solve the problem.

ADD REPLY
0
Entering edit mode
3.6 years ago
Gregor Rot ▴ 540

Also this script using e-utilities should work:

if ! type "efetch" > /dev/null; then
  print "Please install E-utilitie."
fi
GSM=$1
! type "foo" > /dev/null 2>&1;
echo $GSM retrieves from NCBI GEO.....
all_data=`esearch -db sra -query $GSM |efetch -format docsum |xtract -pattern DocumentSummary -element Run@acc`
for SRR in ${all_data}
do
  echo "processing" $SRR
  fastq-dump -A $SRR
done
ADD COMMENT
0
Entering edit mode

I am afraid that I may not be doing things correctly. I just copied the script into script1.sh and run sh script1.sh and it gives the following error

script1.sh: line 1: type: efetch: not found
script1.sh: line 2: print: command not found
retrieves from NCBI GEO.....
script1.sh: line 7: esearch: command not found
script1.sh: line 7: efetch: command not found
script1.sh: line 7: xtract: command not found
ADD REPLY

Login before adding your answer.

Traffic: 1913 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6