Creating a sample specific directory on the go while using -wget command to download the files
0
0
Entering edit mode
3 months ago
Hrituraj • 0

I want to download fastq files (> 1000) using wget . But while donwloading I want to save it to a specific directory that matches the sample name. For instance, if I have paired end reads, SRR22859377_1 and SRR22859377_2 , I want the directory to be named as SRR22859377. However, I want this to automatically happen while downloading for every sample, without creating the directory before.

I am having more than 1000 samples and don't want to create a directory everytime I download a sample.

Is it possible?

Thanks! :)

wget ncbi • 774 views
ADD COMMENT
0
Entering edit mode

if possible use wget --force-directories , otherwise just write a shell script...

ADD REPLY
0
Entering edit mode

If the above does not work for you, you can give a try with the following, I have not tested the code though.

Assuming you have file with the accessions of interest, you can make use of wget's -P option to download to a directory of your choice. wget would create this directory for you if it does not exist.

#! /usr/bin/env bash

base_url="..."

while read accession; do
  wget "${base_url}${accession}_1" -P "${accession}"
  wget "${base_url}${accession}_2" -P "${accession}" 
done <accessions.tx
ADD REPLY
0
Entering edit mode

What is accessions.tx here??

ADD REPLY
0
Entering edit mode

A file with accession numbers you need to get. One per line.

ADD REPLY
0
Entering edit mode

It shows accessions.tx is no such file or directory.

ADD REPLY
0
Entering edit mode

You need to make that file.

ADD REPLY
0
Entering edit mode

I get what you mean now!!

ADD REPLY
0
Entering edit mode

The base_url having issues. Could you give a few details how the base_url should be, because it connects to the ftp, but fails to download the file.

ADD REPLY
0
Entering edit mode

base_url will be the part that each accession's ftp/http link has in common.

ADD REPLY
0
Entering edit mode

Well not exactly the directory I want using -- force-directories, it creates something like ftp.sra.ebi.ac.uk/vol1/fastq/SRR228/068/SRR22851168

This isn't the way I want. o_0

ADD REPLY
0
Entering edit mode

Use something like: How to download FASTQ files from the European Nucleotide Archive (ENA) to use them with FastQC etc..

As noted there use aspera as download method since you have 1000+ sets. Move the files into their own directories are they download is done. That should be easy with a shell script.

ADD REPLY
0
Entering edit mode

Please do not use bioinformatics as a tag unless your post is about the field of bioinformatics itself. For proper examples, please see Forum and News type posts under https://www.biostars.org/tag/bioinformatics/

I've removed the tag this time but please be more mindful in the future.

ADD REPLY

Login before adding your answer.

Traffic: 1726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6