Entering edit mode
18 months ago
DanielEB_fisk
▴
20
Hi biostars community!
I am having issues to loop cutadapt over gunzipped samples. This is the script I am using:
#!/bin/bash
#SBATCH --account GRINFISH
#SBATCH -c 8
#SBATCH --mem 96g
#SBATCH --output logfile.out
#SBATCH --error logfile.err
# This script performs trimming for PE sequences with cutadapt and then runs fastqc in the result.
# Set the number of parallel processes/threads to match the allocated CPUs
PARALLEL_PROCESSES=$SLURM_CPUS_PER_TASK
# Setting parameters
LISTFOR=lists/forward-list.txt
LISTREV=lists/reverse-list.txt
TRIMDIR=trim/1stBatch/
ADAPTERS=refs/NexteraPE_NT.fa
# Performing cutadapt
parallel --jobs $PARALLEL_PROCESSES "cutadapt -a file:"${ADAPTERS}" -A file:"${ADAPTERS}" -o "${TRIMDIR}"{.}_trimmed_1.fq.gz -p "${TRIMDIR}"{.}_trimmed_2.fq.gz {1} {2}" ::: ${LISTFOR} ::: ${LISTREV}
I have some idea that the issue may be in the way the "" or '' are used in the cutadapt call, I am not sure though. The paths are correct and both cutadapt and parallel are installed.
Thank you!
Somewhat unrelated but if you are using SLURM on a cluster why add the complication of
parallel
? Simply submit multiple jobs with your samples directly to SLURM.Hi,
Thank you for taking the time for reading it and answering!
I am not entirely sure I understand, I am using GNU parallel as substitute of a for loop to loop over the list of samples. Perhaps you mean to do something like this?:
Not being a parallel user I missed that application. But isn't this inefficient? You have a single SLURM job that all the processing is happening within and it is constrained by 8 cores. If you were to submit multiple parallel jobs for each sample those would likely run faster (within resources allocated for your account. But this may simply be a matter of how one is used to doing things.
As for your original question using single quotes on the outside may do the trick. Looks like the expansion of the options seems to be confusing
cutadapt
.Thank you!
I don't think I have tried that alternative specifically.
Concerning whether it is inefficient or not, being honest I do not really know. The ultimate objective I have is to loop over a list where the file names are contained. The other option would be to use wild cards, but I have tried this and it does not work.
Let's see if it goes this time.
Again, thank you for your time
This question had been answered by GenoMax in a previous comment.
Was it this comment? I can move that to answer (which you can accept) to provide closure to this thread.
It was the last one, every now and then I do use parallel. But the for loop with wrap has proven to be most useful (worked).