Hi there folks, I am trying to download SRA files from a dataset I compiled from the NIH Roadmap Data Matrix. The problem is that the dataset spans about 50-60 files. I have little intend of downloading them by hand and I thought a quick wget would help but for some reason each link provided just holds another subdirectory and does not point to the locations of the actual *.sra files which makes it a lot harder to download them:
List:
Sample1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX099/SRX099571
Sample2 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX040/SRX040594
My script so far does this:
while read name files; do
mkdir $name
wget $files -P /$name/
done < List
I tried some different approaches like (and many others):
wget --no-parent -r -l1 ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByExp/sra/SRX/SRX099/SRX099571/
My list just contains my sample names and the links which were provided by the data matrix. But wget seem to have problems accessing the subdirectory with the *.sra file without specifying the path explicitly.
If anybody has an idea on how to solve this I would be eternally grateful. Since at this time I probably would have been done downloading them by hand.
how about just using fastq-dump ? https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump ?
Well, it would work but I would still need to acquire the explicit identifier of every experiment since the GEO accession does not work and the compression of *.sra is quite high since I do not need them all at once but iteratively. Also unfortunately, the NIH Roadmap does not let me export the direct *.sra identifiers but only the mentioned subdirectories. Meaning that I would still look up the sra-IDs by hand correct ? Sorry if I overlooked something terribly obvious.
Short example: https://www.ncbi.nlm.nih.gov/geo/roadmap/epigenomics/?view=samples&sample=CD14%20primary%20cells
Then I hit export, take the file and create my list from this. If I am doing something stupid please let me know. Thank you for your help.
See if getting them from EBI-ENA is less painful. You could get fastq files directly avoiding sratoolkit altogether.
If the SRX files are in order, you could print all the wget commands through a loop and run them in the terminal