Bulk download w prefetch but unable to 'zip' (plus more...)
1
0
Entering edit mode
3.1 years ago
j_eag ▴ 10

Hi (beginner here so go easy on me). I'm practicing different ways of downloading. I have various questions despite doing a lot of googling on the matter.

1) I'm trying to run something like this (I know these aren't the exact commands for prefetch):

prefetch $(<SRAacclist.txt) --gzip --outdir /scratch/eg5/trial2/sns/fqdata

When I run prefetch $(<SRAacclist.txt) my files do get downloaded of course, but they're not zipped, or in the folder I want them to be. Additionally it downloads extra sra folders, when all I want is the fastq file. How can I specify this?

2) All my modules are loaded ( edirect, sra etc) yet I keep getting a not found error for " --format"

esearch -db sra -query PRJNA386935 | efetch -format runinfo | cut -d "," -f 1 > SRR.numbers

Any ideas?

3) For downloading from SRA to hpc cluster folder: prefetch vs parallel vs wget vs fastqdump. What do you guys think? So far prefetch jas been the fastest, but fastq dump seems to be most easily 'customizable'.

fastq sequencing RNA-seq sra • 3.1k views
ADD COMMENT
0
Entering edit mode

prefetch has no gzip option afaik, and it makes no sense because the sra format is already binary and comressed. There is also no --outdir but :

-o|--output-file <FILE>          write file to FILE when downloading 
                                   single file 
-O|--output-directory <DIRECTORY>  save files to DIRECTORY/ 

Type prefetch -h and read the help.

ADD REPLY
0
Entering edit mode

yeah, sorry I should have clarified. I already checked the --help section and tested out different commands, my post was just to demonstrate what Im trying to do. I've tried using " --type" and choosing *.fastq.gz, but none of it worked

ADD REPLY
0
Entering edit mode

prefetch does not return fastq, it returns sra files which require conversion to fastq with fastq-dump.

ADD REPLY
0
Entering edit mode

For me (when I did prefetch $(<acclist.txt) ) , it downloaded fastq files and (.fastq) and folders for each that contained .sra files

ADD REPLY
0
Entering edit mode

Ah, after years and years that users complained about that missing feature they seem to have recently added that functionality to get fastq directly. Hah, only 10 years too late, but hey why not :) Now gzip is missing, yeah, that is sra-tools, a collection of mess, that is simply how it is /shrug.

ADD REPLY
0
Entering edit mode

what version of prefetch do you have? mine does not download fastq, and more so we usually need to specify how to unpack fastq, does it unpack the files?

ADD REPLY
0
Entering edit mode

2.11 seems to offer that now so quite a recent addition

ADD REPLY
0
Entering edit mode

I ran the new prefetch, and did not get a FASTQ file:

prefetch SRR14575325

the tool does indeed work differently, it creates a subdirectory for the SRA file rather than putting under ~/ncbi/public/sra but I don't get FASTQ files there

ADD REPLY
0
Entering edit mode

add --type fastq

ADD REPLY
0
Entering edit mode

this is so typical of all sra tools in general

prefetch  SRR1972739

works fine, downloads the SRA file but right after it if I do:

prefetch --type fastq  SRR1972739 

prints:

2021-10-14T17:57:47 prefetch.2.11.2 err: name not found while resolving query within virtual file system module - failed to resolve accession 'SRR1972739' - no data ( 404 )
ADD REPLY
0
Entering edit mode

"prefetch" version 2.10.9

ADD REPLY
0
Entering edit mode

Just leaving this here fyi: sra-explorer : find SRA and FastQ download URLs in a couple of clicks

You do not need sra-tools to get data, there are (better) alternatives.

ADD REPLY
0
Entering edit mode

Alas the SRA has introduced changes that broke the Explorer. Only the links to EBI work. For example, this is what the explorer shows:

ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByRun/sra/SRR/SRR197/SRR1972739/SRR1972739.sra

the file is not there anymore, you need a different method to find it.

ADD REPLY
0
Entering edit mode

Yes, this is known. They moved the SRA files to the cloud and Phil Ewels has not yet made the changes to the explorer, but there are issues pointing this out already.

ADD REPLY
0
Entering edit mode

This post prompted me to investigate the methods so that I know how to advise people.

I wrote up the results here

What is the best way to obtain FASTQ reads from the Short Read Archive (SRA)

ADD REPLY
1
Entering edit mode
3.1 years ago

efetch takes parameters with single minus

-format 

now when it comes to speed, the commands work in mysterious ways, the reasons for the speed differences are not properly explained

ADD COMMENT
0
Entering edit mode

Sorry thats what the online source told me to use! Thanks, works now.

ADD REPLY
0
Entering edit mode

the command looks very much like what I advocate in the Biostar Handbook, as it turns out entrez direct has been updated and it used to take both types of parameters, looks like they only take the short form. so I have to update the book,

Edit: the book has been corrected

ADD REPLY

Login before adding your answer.

Traffic: 1879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6