fastq-dump is slowwwww.
I found this as the fastest way.
vdb-dump SRR453569.sra | awk 'NR%2==1 { print $0 "/1" } ; NR%2==0 { print substr($0,0,length($0)/2) }' - > SRR453569_1.fq
vdb-dump SRR453569.sra | awk 'NR%2==1 { print $0 "/2" } ; NR%2==0 { print substr($0,length($0)/2 + 1) }' - > SRR453569_2.fq
Can also be combined using making a copy in memory and writing.
If it may help someone save time.
I see order of magnitude difference.
My run looks like this:
/usr/bin/time fastq-dump --split-3 SRR453569.sra
Read 4032514 spots for SRR453569.sra
Written 4032514 spots for SRR453569.sra
370.25user 3.18system 6:17.10elapsed 99%CPU (0avgtext+0avgdata 46324maxresident)k
0inputs+5064544outputs (0major+246626minor)pagefaults 0swaps
and for vdb it's like:
/usr/bin/time vdb-dump -I -f fastq SRR453569.sra | awk 'NR%2==1 { print $0 "/1" } ; NR%2==0 { print substr($0,0,length($0)/2) }' - > SRR453569_1.fq
23.49user 3.23system 0:58.20elapsed 45%CPU (0avgtext+0avgdata 330488maxresident)k
1026496inputs+64outputs (6major+635833minor)pagefaults 0swaps
UPDATE: Sorry I forgot to add flags for vdb-dump above, new command is:
vdb-dump -I -f fastq SRR453569.sra | awk 'NR%2==1 { print $0 "/1" } ; NR%2==0 { print substr($0,0,length($0)/2) }' - > SRR453569_1.fq
vdb-dump -I -f fastq SRR453569.sra | awk 'NR%2==1 { print $0 "/2" } ; NR%2==0 { print substr($0,length($0)/2 + 1)}' - > SRR453569_2.fq
Again they can be combined
UPDATE2: FASTEST: combined
$ /usr/bin/time vdb-dump -I -f fastq SRR493371.sra | tee >(awk 'NR%2==1 { print $0 "/1" } ; NR%2==0 { print substr($0,0,length($0)/2) }' - > SRR493371_tee_1.fq) | awk 'NR%2==1 { print $0 "/2" } ; NR%2==0 { print substr($0,length($0)/2 + 1)}' - > SRR493371_tee_2.fq
UPDATE3: use latest version of fastq-dump i.e. 2.5.XX, previous version have issues with running time.
... or just download the FASTQ files from ENA and skip all of the stupidity related to the SRA format ...
It is not easy to understand why you can save the time, or else, when you save the time, you have missed some important filter or checking? I think we need email the author of vdb-dump, can be clear about this question.
That's true I also think I am missing something since difference is substantial. Mailing authors sounds like a good idea.