Hello,
A month ago, I utilized the SRA Toolkit Pipeline to download Fastq files from a BioProject accession. Following the recommended steps, I generated a list of SRR Names, used prefetch, and then employed fasterq-dump (using parallel-fastq-dump) to obtain the data locally, resulting in fq.gz files with the corresponding SRR names.
Recently, while composing a review for my project, I attempted prefetch with the BioProject accession name. Surprisingly, it not only worked but also downloaded the files in fq.gz format, a task that prefetch supposedly cannot perform. Furthermore, it downloaded the files using the original project ID names(as used in the paper, as opposed to SRR names). I am puzzled by this unexpected behavior and would appreciate any insights into why this occurred.
Anecdotal evidence is hard to comment on. Give a precise code example for reproduction.
Hey sorry if my post was not adequate. For retrieving all fq.gz data I just used
prefetch PRJNA393611
Were you using two separate versions of
sratoolkit
at the two times? Functionality is routinely added with newer versions. Additional command line options may have been added to change the default behavior. There can be many explanations.I guess it was something they added recently and there is no documentation for this, although I find it peculiar that the downloaded date lacks the dataset name and instead displays the names added by the authors. Regardless, I hope that in the future, someone discovers this post and opts to execute prefetch using the library name, bypassing the need for a script to retrieve all SRRs as I did:
To answer your question, no the version was the same but I didn't know prefetch could download all fastq with just the library name. Anyway, thank you for your insights!