Entering edit mode
11 months ago
Fadwa
▴
10
Hii
I am working with Snakemake to process a CSV file containing SRR IDs for downloading. In the initial rule, I use the SRA ID as a wildcard to fetch SRR files from NCBI. However, when I attempt to parallelize the job using the -j 2 option, the downloading step does not parallelize as expected. Can you please assist me with this issue?
home = os.path.expanduser("~")
fichier_csv = os.path.join(home, 'sra_list.csv')
SRA_LIST = []
with open(fichier_csv, 'rt') as f:
for line in f:
line = line.split()[0].strip()
if re.match('[SED]RR\d+$', line):
SRA_LIST.append(line)
rule fetch_fastq:
output:
config["RESULTS"] + "Fastq_Files/{sra}.fastq.gz"
log:
config["RESULTS"] + "Supplementary_Data/Logs/{sra}.sratoolkit.log"
benchmark:
config["RESULTS"] + "Supplementary_Data/Benchmark/{sra}.sratoolkit.txt"
message:
"fetch fastq from NCBI"
params:
conda = "sratoolkit",
outdir = config["RESULTS"] + "Fastq_Files"
threads: 8
shell:
"""
set +eu &&
. $(conda info --base)/etc/profile.d/conda.sh &&
conda activate {params.conda}
fastq-dump \
--split-spot \
--skip-technical {wildcards.sra} \
--stdout 2>{log} \
| gzip -c > {output}
"""
can you please help me to parallelize this ??
Do you have enough resources on the machine? You're requesting 8 threads for a single thread process.
Yes, i have enough resources. it's just a test
Try using
-j 16