Creating a sample specific directory on the go while using -wget command to download the files

0

Entering edit mode

11 months ago

Hrituraj • 0

I want to download fastq files (> 1000) using wget . But while donwloading I want to save it to a specific directory that matches the sample name. For instance, if I have paired end reads, SRR22859377_1 and SRR22859377_2 , I want the directory to be named as SRR22859377. However, I want this to automatically happen while downloading for every sample, without creating the directory before.

I am having more than 1000 samples and don't want to create a directory everytime I download a sample.

Is it possible?

Thanks! :)

wget ncbi • 1.7k views

ADD COMMENT • link updated 11 months ago by Ram 45k • written 11 months ago by Hrituraj • 0

0

Entering edit mode

if possible use wget --force-directories , otherwise just write a shell script...

ADD REPLY • link 11 months ago by Pierre Lindenbaum 166k

0

Entering edit mode

If the above does not work for you, you can give a try with the following, I have not tested the code though.

Assuming you have file with the accessions of interest, you can make use of wget's -P option to download to a directory of your choice. wget would create this directory for you if it does not exist.

#! /usr/bin/env bash

base_url="..."

while read accession; do
  wget "${base_url}${accession}_1" -P "${accession}"
  wget "${base_url}${accession}_2" -P "${accession}" 
done <accessions.tx

ADD REPLY • link 11 months ago by Haci ▴ 740

0

Entering edit mode

What is accessions.tx here??

ADD REPLY • link 11 months ago by Hrituraj • 0

0

Entering edit mode

A file with accession numbers you need to get. One per line.

ADD REPLY • link 11 months ago by GenoMax 151k

0

Entering edit mode

It shows accessions.tx is no such file or directory.

ADD REPLY • link 11 months ago by Hrituraj • 0

0

Entering edit mode

You need to make that file.

ADD REPLY • link 11 months ago by GenoMax 151k

0

Entering edit mode

I get what you mean now!!

ADD REPLY • link 11 months ago by Hrituraj • 0

0

Entering edit mode

The base_url having issues. Could you give a few details how the base_url should be, because it connects to the ftp, but fails to download the file.

ADD REPLY • link 11 months ago by Hrituraj • 0

0

Entering edit mode

base_url will be the part that each accession's ftp/http link has in common.

ADD REPLY • link 11 months ago by Haci ▴ 740

0

Entering edit mode

Well not exactly the directory I want using -- force-directories, it creates something like ftp.sra.ebi.ac.uk/vol1/fastq/SRR228/068/SRR22851168

This isn't the way I want. o_0

ADD REPLY • link 11 months ago by Hrituraj • 0

0

Entering edit mode

Use something like: How to download FASTQ files from the European Nucleotide Archive (ENA) to use them with FastQC etc..

As noted there use aspera as download method since you have 1000+ sets. Move the files into their own directories are they download is done. That should be easy with a shell script.

ADD REPLY • link 11 months ago by GenoMax 151k

0

Entering edit mode

Please do not use bioinformatics as a tag unless your post is about the field of bioinformatics itself. For proper examples, please see Forum and News type posts under https://www.biostars.org/tag/bioinformatics/

I've removed the tag this time but please be more mindful in the future.

ADD REPLY • link 11 months ago by Ram 45k

Login before adding your answer.