Question

Unable to download fastq files in parallel / SOS

0

Entering edit mode

3.6 years ago

j_eag ▴ 10

Hi!

Very new to all this so bear with me if I'm using incorrect terminology. Also english is my second language.

I'm trying to download my fastq files in parallel but it doesn't work and I keep receiving this error:

fastq-dump.2.10.9 err: error unexpected while resolving query within virtual file system module - No accession to process ( 500 )

Does anyone have any suggestions? I can download fastq files individually but for an upcoming experiment will have over 50 files so I need to be able to download parallel.

MORE INFO:

I am doing everything on an hpc cluster within the scratch directory. I have a directory named rnaseqtrial and within it I have the accession list. Once I switch to a computing node I simply run :

for i in $(cat /scratch/eag88/trial/SRR_Acc_List.txt ); do sbatch download.sh ${i}; done

download.sh :

#!/bin/bash
#SBATCH --job-name=download
#SBATCH --mail-type=ALL
#SBATCH --mail-user=my email
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=4G

module load sra-tools/2.10.9
output_dir= “/scratch/eag88/rnaseqtrial /rawsamples"
mkdir -p $output_dir
fastq-dump —gzip —split-files $1 —outdir $output dir

fastq sequencing RNA-seq slurm • 2.6k views

ADD COMMENT • link updated 2.2 years ago by Ram 45k • written 3.6 years ago by j_eag ▴ 10

1

Entering edit mode

SRA-toolkit supports the multithreading option, but it is for single files and not multiple files. I think you mean here batch download instead of parallel.

For batch download on HPC (SLURM job scheduler), you should consider using Nextflow SLURM executor or run each job separately or batch download of SRA files.

ADD REPLY • link 3.6 years ago by Renesh ★ 2.2k

0

Entering edit mode

Thanks for the response! I'm following a course so I just followed what they did. Do you have an idea as to why mine wouldn't work?

I tried doing a batch download but kept running into errors and unfortunately don't have enough terminal knowledge for the nextflow slurm executor.

ADD REPLY • link 3.6 years ago by j_eag ▴ 10

0

Entering edit mode

First, try to run the command with any accession such as fastq-dump --split-files SRR8296149 on compute node. If it works, then you have an issue with your script.

ADD REPLY • link 3.6 years ago by Renesh ★ 2.2k

1

Entering edit mode

Looks like you have an extra space in your output dir /scratch/eag88/rnaseqtrial /rawsamples in your script. Remove the space after rnaseqtrial and try again.

ADD REPLY • link 3.6 years ago by GenoMax 151k

1

Entering edit mode

In addition, there is a missing underscore in the last line. should be:

      fastq-dump —gzip —split-files $1 —outdir $output_dir

Further, it is difficult to spot, but the following line contains the wrong quotes type, this might not be a problem or come from copy-pasting in the browser (or copying directly from word files or PDF), but better check if that is correct

      output_dir= “/scratch/eag88/rnaseqtrial /rawsamples"

      #should be: 

      output_dir="/scratch/eag88/rnaseqtrial/rawsamples"

ADD REPLY • link 3.6 years ago by Michael 55k