Hi (beginner here so go easy on me). I'm practicing different ways of downloading. I have various questions despite doing a lot of googling on the matter.
1) I'm trying to run something like this (I know these aren't the exact commands for prefetch):
prefetch $(<SRAacclist.txt) --gzip --outdir /scratch/eg5/trial2/sns/fqdata
When I run prefetch $(<SRAacclist.txt)
my files do get downloaded of course, but they're not zipped, or in the folder I want them to be. Additionally it downloads extra sra folders, when all I want is the fastq file. How can I specify this?
2) All my modules are loaded ( edirect, sra etc) yet I keep getting a not found error for " --format"
esearch -db sra -query PRJNA386935 | efetch -format runinfo | cut -d "," -f 1 > SRR.numbers
Any ideas?
3) For downloading from SRA to hpc cluster folder: prefetch vs parallel vs wget vs fastqdump. What do you guys think? So far prefetch jas been the fastest, but fastq dump seems to be most easily 'customizable'.
prefetch has no gzip option afaik, and it makes no sense because the sra format is already binary and comressed. There is also no
--outdir
but :Type
prefetch -h
and read the help.yeah, sorry I should have clarified. I already checked the --help section and tested out different commands, my post was just to demonstrate what Im trying to do. I've tried using " --type" and choosing *.fastq.gz, but none of it worked
prefetch
does not return fastq, it returns sra files which require conversion to fastq withfastq-dump
.For me (when I did
prefetch $(<acclist.txt)
) , it downloaded fastq files and (.fastq) and folders for each that contained .sra filesAh, after years and years that users complained about that missing feature they seem to have recently added that functionality to get fastq directly. Hah, only 10 years too late, but hey why not :) Now gzip is missing, yeah, that is sra-tools, a collection of mess, that is simply how it is /shrug.
what version of prefetch do you have? mine does not download fastq, and more so we usually need to specify how to unpack fastq, does it unpack the files?
2.11 seems to offer that now so quite a recent addition
I ran the new prefetch, and did not get a FASTQ file:
the tool does indeed work differently, it creates a subdirectory for the SRA file rather than putting under
~/ncbi/public/sra
but I don't get FASTQ files thereadd
--type fastq
this is so typical of all sra tools in general
works fine, downloads the SRA file but right after it if I do:
prints:
"prefetch" version 2.10.9
Just leaving this here fyi: sra-explorer : find SRA and FastQ download URLs in a couple of clicks
You do not need sra-tools to get data, there are (better) alternatives.
Alas the SRA has introduced changes that broke the Explorer. Only the links to EBI work. For example, this is what the explorer shows:
the file is not there anymore, you need a different method to find it.
Yes, this is known. They moved the SRA files to the cloud and Phil Ewels has not yet made the changes to the explorer, but there are issues pointing this out already.
This post prompted me to investigate the methods so that I know how to advise people.
I wrote up the results here
What is the best way to obtain FASTQ reads from the Short Read Archive (SRA)