Question

Downloading SRA data using the SRA Toolkit

0

Entering edit mode

10.0 years ago

zizigolu ★ 4.3k

Hey,

i opened cmd and typed C:\Users\yang\Downloads\Compressed\sratoolkit.2.5.0-win64\bin

then i typed fastq-dump -X 5 -Z SRR390728

in cmd i am watching

C:\Users\yang\Downloads\Compressed\sratoolkit.2.5.0-win64\bin\fastq-dump -X 5 -Z

SRR390728
Read 5 spots for SRR390728
Written 5 spots for SRR390728
@SRR390728.1 1 length=72
CATTCTTCACGTAGTTCTCGAGCCTTGGTTTTCAGCGATGGAGAATGACTTTGACAAGCTGAGAGAAGNTNC
+SRR390728.1 1 length=72
;;;;;;;;;;;;;;;;;;;;;;;;;;;9;;665142;;;;;;;;;;;;;;;;;;;;;;;;;;;;;96;;;;(
@SRR390728.2 2 length=72
AAGTAGGTCTCGTCTGTGTTTTCTACGAGCTTGTGTTCCAGCTGACCCACTCCCTGGGTGGGGGGACTGGGT
+SRR390728.2 2 length=72
;;;;;;;;;;;;;;;;;4;;;;3;393.1+4;;5;;;;;;;;;;;;;;;;;;;;;;;;9;;;;;;;464262

(.....)

where i can find the downloaded file???? i was going to download a sra file from GEO.

RNA-Seq rna-seq sequencing • 15k views

ADD COMMENT • link updated 5.9 years ago by Drew.Judell • 0 • written 10.0 years ago by zizigolu ★ 4.3k

ATpoint · Answer 1 · 2019-03-08

3

Entering edit mode

6.2 years ago

GenoMax 151k

Since this old thread has been re-activated I will add two current methods that should be used for this purpose instead of using SRAtoolkit:

ADD COMMENT • link 6.2 years ago by GenoMax 151k

0

Entering edit mode

genomax as you suggested to me already, Fast download of FASTQ files from the European Nucleotide Archive (ENA) is quite applicable

ADD REPLY • link updated 6.2 years ago by ATpoint 88k • written 6.2 years ago by zizigolu ★ 4.3k

0

Entering edit mode

This tutorial also contains a section on how to use the sratoolkit (prefetch and fastq-dump) efficiently (bottom of the tutorial).

ADD REPLY • link 6.2 years ago by ATpoint 88k

score 0 · Answer 2 · 2015-05-16

0

Entering edit mode

10.0 years ago

Devon Ryan 105k

While you could download the file from GEO (or just directly from SRA), it would seem easier to just remove the -Z option and allow fastq-dump to write the fastq files for you. Exceedingly few programs directly accept SRA files, but pretty much everything will take fastq.

BTW, that's a paired-end dataset, so make sure you specify --split-files.

ADD COMMENT • link 10.0 years ago by Devon Ryan 105k

0

Entering edit mode

By default (now in 2019), one should use --split-3 to separate potential singletons that might mess up the pairwise structure of the two mate fastq files which eventually might crash the aligner.

ADD REPLY • link 6.2 years ago by ATpoint 88k

0

Entering edit mode

there are several tools that accept SRA format like hisat2, bowtie2 and GATK, but if you need fastq files I recommend using fasterq-dump over fastq-dump as its much faster in dumping fastq files.

ADD REPLY • link 5.2 years ago by yskripchenko • 0

0

Entering edit mode

fasterq-dump is pretty useless in my opinion as it does not offer gzip compression. The gains in speed are lost when you manually have to compress files afterwards. Might be ok for smaller files but you are not going to have uncompressed WGS data on your disk, or at least you should not. Either download directly as fastq or use parallel-fastq-dump as suggested here Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLY • link 5.2 years ago by ATpoint 88k

0

Entering edit mode

So not having a gzip option as part of a command line is not much a problem as you can add that as part of your process to dump data. I ran a quick experiment with SRR10996301 which generates only about 1Gb file for each read. I pulled the data into my GCP VM from their cloud locations instead of NCBI location and here are the times I got for fastq-dump

Read 10234248 spots for SRR10996301
Written 10234248 spots for SRR10996301

real    19m34.332s
user    19m24.236s
sys     0m9.912s

Here is the times for fasterq-dump followed by gzip on both files (done in a simple bash script):

spots read      : 10,234,248
reads read      : 20,468,496
reads written   : 20,468,496

real    6m18.689s
user    3m45.344s
sys     0m35.736s

Also, fastq-dump hasn't been updated in 4 years and they might deprecate it.

ADD REPLY • link 5.2 years ago by yskripchenko • 0

ATpoint · Answer 3 · 2019-03-08

0

Entering edit mode

6.2 years ago

cigdemselli ▴ 30

The default location for the "download repository" is:

Linux: /home/[user_name]/ncbi/public
Mac OS X: /Users/[user_name]/ncbi/public
Windows: C:\Users\[user_name]\ncbi\public

Please see this link for more details: https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=std#s-4

ADD COMMENT • link updated 6.2 years ago by ATpoint 88k • written 6.2 years ago by cigdemselli ▴ 30

0

Entering edit mode

Hi, welcome to Biostars :)

Please use the formatting bar to indicate code examples, paths, example data etc. I did the changes for you this time.

enter image description here

ADD REPLY • link 6.2 years ago by ATpoint 88k

score 0 · Answer 4 · 2019-06-12

0

Entering edit mode

5.9 years ago

Drew.Judell • 0

I had to use the find command to find the directory which is located at the root directory rather than the user directory. prefetch downloads SRA data to ~/ncbi/public/sra despite having the tools installed at /home/[user-name]/ncbi

ADD COMMENT • link 5.9 years ago by Drew.Judell • 0

0

Entering edit mode

hi,

If you are on a Linux machine you should update to the 2.10.2 release of the SRA toolkit and configure the toolkit so prefetch downloads into the directory you want it to. Here are the instructions on how to configure it: https://github.com/ncbi/sra-tools/wiki/05.-Toolkit-Configuration

ADD REPLY • link 5.2 years ago by yskripchenko • 0