Downloading SRA data using the SRA Toolkit
4
0
Entering edit mode
9.5 years ago
zizigolu ★ 4.3k

Hey,

i opened cmd and typed C:\Users\yang\Downloads\Compressed\sratoolkit.2.5.0-win64\bin

then i typed fastq-dump -X 5 -Z SRR390728

in cmd i am watching

C:\Users\yang\Downloads\Compressed\sratoolkit.2.5.0-win64\bin\fastq-dump -X 5 -Z

SRR390728
Read 5 spots for SRR390728
Written 5 spots for SRR390728
@SRR390728.1 1 length=72
CATTCTTCACGTAGTTCTCGAGCCTTGGTTTTCAGCGATGGAGAATGACTTTGACAAGCTGAGAGAAGNTNC
+SRR390728.1 1 length=72
;;;;;;;;;;;;;;;;;;;;;;;;;;;9;;665142;;;;;;;;;;;;;;;;;;;;;;;;;;;;;96;;;;(
@SRR390728.2 2 length=72
AAGTAGGTCTCGTCTGTGTTTTCTACGAGCTTGTGTTCCAGCTGACCCACTCCCTGGGTGGGGGGACTGGGT
+SRR390728.2 2 length=72
;;;;;;;;;;;;;;;;;4;;;;3;393.1+4;;5;;;;;;;;;;;;;;;;;;;;;;;;9;;;;;;;464262

(.....)

where i can find the downloaded file???? i was going to download a sra file from GEO.

RNA-Seq rna-seq sequencing • 14k views
ADD COMMENT
3
Entering edit mode
5.7 years ago
GenoMax 147k

Since this old thread has been re-activated I will add two current methods that should be used for this purpose instead of using SRAtoolkit:

ADD COMMENT
0
Entering edit mode

genomax as you suggested to me already, Fast download of FASTQ files from the European Nucleotide Archive (ENA) is quite applicable

ADD REPLY
0
Entering edit mode

This tutorial also contains a section on how to use the sratoolkit (prefetch and fastq-dump) efficiently (bottom of the tutorial).

ADD REPLY
0
Entering edit mode
9.5 years ago

While you could download the file from GEO (or just directly from SRA), it would seem easier to just remove the -Z option and allow fastq-dump to write the fastq files for you. Exceedingly few programs directly accept SRA files, but pretty much everything will take fastq.

BTW, that's a paired-end dataset, so make sure you specify --split-files.

ADD COMMENT
0
Entering edit mode

By default (now in 2019), one should use --split-3 to separate potential singletons that might mess up the pairwise structure of the two mate fastq files which eventually might crash the aligner.

ADD REPLY
0
Entering edit mode

there are several tools that accept SRA format like hisat2, bowtie2 and GATK, but if you need fastq files I recommend using fasterq-dump over fastq-dump as its much faster in dumping fastq files.

ADD REPLY
0
Entering edit mode

fasterq-dump is pretty useless in my opinion as it does not offer gzip compression. The gains in speed are lost when you manually have to compress files afterwards. Might be ok for smaller files but you are not going to have uncompressed WGS data on your disk, or at least you should not. Either download directly as fastq or use parallel-fastq-dump as suggested here Fast download of FASTQ files from the European Nucleotide Archive (ENA)

ADD REPLY
0
Entering edit mode

So not having a gzip option as part of a command line is not much a problem as you can add that as part of your process to dump data. I ran a quick experiment with SRR10996301 which generates only about 1Gb file for each read. I pulled the data into my GCP VM from their cloud locations instead of NCBI location and here are the times I got for fastq-dump

Read 10234248 spots for SRR10996301
Written 10234248 spots for SRR10996301

real    19m34.332s
user    19m24.236s
sys     0m9.912s

Here is the times for fasterq-dump followed by gzip on both files (done in a simple bash script):

spots read      : 10,234,248
reads read      : 20,468,496
reads written   : 20,468,496

real    6m18.689s
user    3m45.344s
sys     0m35.736s

Also, fastq-dump hasn't been updated in 4 years and they might deprecate it.

ADD REPLY
0
Entering edit mode
5.7 years ago
cigdemselli ▴ 30

The default location for the "download repository" is:

Linux: /home/[user_name]/ncbi/public
Mac OS X: /Users/[user_name]/ncbi/public
Windows: C:\Users\[user_name]\ncbi\public

Please see this link for more details: https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=std#s-4

ADD COMMENT
0
Entering edit mode

Hi, welcome to Biostars :)

Please use the formatting bar to indicate code examples, paths, example data etc. I did the changes for you this time.

enter image description here

ADD REPLY
0
Entering edit mode
5.5 years ago

I had to use the find command to find the directory which is located at the root directory rather than the user directory. prefetch downloads SRA data to ~/ncbi/public/sra despite having the tools installed at /home/[user-name]/ncbi

ADD COMMENT
0
Entering edit mode

hi,

If you are on a Linux machine you should update to the 2.10.2 release of the SRA toolkit and configure the toolkit so prefetch downloads into the directory you want it to. Here are the instructions on how to configure it: https://github.com/ncbi/sra-tools/wiki/05.-Toolkit-Configuration

ADD REPLY

Login before adding your answer.

Traffic: 2738 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6