Question

Cutadapt error: too many parameters.

0

Entering edit mode

2.2 years ago

DanielEB_fisk ▴ 20

Hi biostars community!

I am having issues to loop cutadapt over gunzipped samples. This is the script I am using:

#!/bin/bash
#SBATCH --account GRINFISH
#SBATCH -c 8
#SBATCH --mem 96g
#SBATCH --output logfile.out
#SBATCH --error logfile.err

# This script performs trimming for PE sequences with cutadapt and then runs fastqc in the result.

# Set the number of parallel processes/threads to match the allocated CPUs
PARALLEL_PROCESSES=$SLURM_CPUS_PER_TASK

# Setting parameters

LISTFOR=lists/forward-list.txt
LISTREV=lists/reverse-list.txt
TRIMDIR=trim/1stBatch/
ADAPTERS=refs/NexteraPE_NT.fa

# Performing cutadapt

parallel --jobs $PARALLEL_PROCESSES "cutadapt -a file:"${ADAPTERS}" -A file:"${ADAPTERS}" -o "${TRIMDIR}"{.}_trimmed_1.fq.gz -p "${TRIMDIR}"{.}_trimmed_2.fq.gz {1} {2}" ::: ${LISTFOR} ::: ${LISTREV}

I have some idea that the issue may be in the way the "" or '' are used in the cutadapt call, I am not sure though. The paths are correct and both cutadapt and parallel are installed.

Thank you!

cutadapt parallel bash • 2.3k views

ADD COMMENT • link updated 16 months ago by ATpoint 88k • written 2.2 years ago by DanielEB_fisk ▴ 20

1

Entering edit mode

Somewhat unrelated but if you are using SLURM on a cluster why add the complication of parallel? Simply submit multiple jobs with your samples directly to SLURM.

ADD REPLY • link 2.2 years ago by GenoMax 152k

0

Entering edit mode

Hi,

Thank you for taking the time for reading it and answering!

I am not entirely sure I understand, I am using GNU parallel as substitute of a for loop to loop over the list of samples. Perhaps you mean to do something like this?:

#!/bin/bash
#SBATCH --account MyAcount
#SBATCH -c 8
#SBATCH --mem 96g
#SBATCH --output logfile.out
#SBATCH --error logfile.err

cutadapt -a file:refs/NexteraPE_NT.fa -A file:refs/NexteraPE_NT.fa -o trim/1stBatch/{.}_trimmed_1.fq.gz -p trim/1stBatch/{.}_trimmed_2.fq.gz  folder/folder/*/*_1.fq.gz folder/folder/*/*_2.fq.gz

ADD REPLY • link 2.2 years ago by DanielEB_fisk ▴ 20

1

Entering edit mode

using GNU parallel as substitute of a for loop

Not being a parallel user I missed that application. But isn't this inefficient? You have a single SLURM job that all the processing is happening within and it is constrained by 8 cores. If you were to submit multiple parallel jobs for each sample those would likely run faster (within resources allocated for your account. But this may simply be a matter of how one is used to doing things.

As for your original question using single quotes on the outside may do the trick. Looks like the expansion of the options seems to be confusing cutadapt.

parallel --jobs $PARALLEL_PROCESSES 'cutadapt -a file:"${ADAPTERS}" -A file:"${ADAPTERS}" -o "${TRIMDIR}"{.}_trimmed_1.fq.gz -p "${TRIMDIR}"{.}_trimmed_2.fq.gz {1} {2}' ::: ${LISTFOR} ::: ${LISTREV}

ADD REPLY • link 2.2 years ago by GenoMax 152k

0

Entering edit mode

Thank you!

I don't think I have tried that alternative specifically.

Concerning whether it is inefficient or not, being honest I do not really know. The ultimate objective I have is to loop over a list where the file names are contained. The other option would be to use wild cards, but I have tried this and it does not work.

Let's see if it goes this time.

Again, thank you for your time

ADD REPLY • link 2.2 years ago by DanielEB_fisk ▴ 20

0

Entering edit mode

This question had been answered by GenoMax in a previous comment.

ADD REPLY • link 2.0 years ago by DanielEB_fisk ▴ 20

0

Entering edit mode

Was it this comment? I can move that to answer (which you can accept) to provide closure to this thread.

ADD REPLY • link 2.0 years ago by GenoMax 152k

0

Entering edit mode

It was the last one, every now and then I do use parallel. But the for loop with wrap has proven to be most useful (worked).

ADD REPLY • link 2.0 years ago by DanielEB_fisk ▴ 20

score 2 · Accepted Answer · 2023-05-26

Just for illustrative purposes you could submit multiple SLURM jobs as follows. Remove the word echo before sbatch is all command lines look correct. (Ref: cutadapt loop and paired-end reads )

for i in *_R1.fastq.gz
do
  SAMPLE=$(echo ${i} | sed "s/_R1\.fastq\.gz//")
  echo ${SAMPLE}_R1.fastq.gz ${SAMPLE}_R2.fastq.gz
  echo sbatch -p Partition --account GRINFISH --mem=NNg -c 8 -o log.out -e log.err --wrap="cutadapt -a file:refs/NexteraPE_NT.fa  -A file:refs/NexteraPE_NT.fa -o ${SAMPLE}_trimmed_1.fq.gz -p ${SAMPLE}_trimmed_2.fq.gz"  
done