Question

Cutadapt issues when running nfcore pipeline

0

Entering edit mode

2.6 years ago

aroso491 • 0

Hello,

I am trying to run the nfcore methylseq pipeline as I've done previously with the same type of .fastq.gz files in the past. The issue I am facing is that I am now on a different computer and I've had to set up all the packages and such by myself and I really don't have a clear idea of what I've managed to do - but I am alone and I have no one to provide support on this at my university.

This is the error log that I'm obtaining after running the pipeline, specifically for cutadapt:

AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> PN0341_0002_S2_L001_R1_001.fastq.gz <<)

Found perfect matches for the following adapter sequences:
Adapter type    Count   Sequence    Sequences analysed  Percentage
Illumina    434 AGATCGGAAGAGC   1000000 0.04
smallRNA    0   TGGAATTCTCGG    1000000 0.00
Nextera 0   CTGTCTCTTATA    1000000 0.00
Using Illumina adapter for trimming (count: 434). Second best hit was smallRNA (count: 0)

Writing report to 'PN0341_0002_S2_L001_R1_001.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: PN0341_0002_S2_L001_R1_001.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.6
Cutadapt version: 3.4
Python version: could not detect
Number of cores used for trimming: 4
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Running FastQC on the data once trimming has completed
Output file(s) will be GZIP compressed

Cutadapt seems to be fairly up-to-date (version 3.4). Setting -j 4
Writing final adapter and quality trimmed output to PN0341_0002_S2_L001_R1_001_trimmed.fq.gz


  >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file PN0341_0002_S2_L001_R1_001.fastq.gz <<< 
This is cutadapt 3.4 with Python 3.8.8
Command line parameters: -j 4 -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC PN0341_0002_S2_L001_R1_001.fastq.gz
Run "cutadapt --help" to see command-line options.
See https://cutadapt.readthedocs.io/ for full documentation.

cutadapt: error: [Errno 2] No such file or directory


Cutadapt terminated with exit signal: '512'.
Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...

I have no other error messages so I literally have no idea of what could've gone wrong.

What I do know is:

a) the file is definitely being read because the way that the nf-core pipeline works is that I provide an input path and the pipeline reads the .fastq.gz files with the corresponding naming convention. This is the right file, right name and it is in the specified directory, so I think (I might be wrong) that the error must be a matter of some outdated parameter, but I cannot figure out which.

TrimGalore version is 0.6.6 and cutadapt version is 3.4

methylseq trimgalore cutadapt nfcore • 1.6k views

ADD COMMENT • link updated 2.6 years ago by Matthias Zepper 5.0k • written 2.6 years ago by aroso491 • 0

0

Entering edit mode

Well, the error clearly says No such file or directory, but I see how that is hardly possible in the middle of a tried and tested pipeline.

Chances are slim, but just to eliminate this quite obvious issue from the start: Are there any blanks in the data or output path? It may be that the pipeline authors have forgotten to quote the variable somewhere...something like /home/username/bioinfo/data/my cool project/reads.fq?

ADD REPLY • link 2.6 years ago by Matthias Zepper 5.0k

0

Entering edit mode

That error is exactly what puzzles me, because the fact that it is reading the name of the file (PN0341_0002_S2_L001_R1_001.fastq.gz) when I am not hardcoding it but passing a directory and then stating that I want it to read any ".fastq.gz" file, means that it is clearly finding that file, which is not incomplete nor anything and which I can see right in front of me.

Cannot spot any blanks and the output path is just fine - I am actually getting an output file in the output path I expected, it is just empty because the pipeline gets terminated with errors.

I have also checked and I have ~1TB available in my disk for the analysis (more than enough), so I have the theory that this might have something to do with my singularity installation, but I am really not familiar with this so I am not sure how to start checking if it had to do anything with this.

ADD REPLY • link 2.6 years ago by aroso491 • 0

0

Entering edit mode

If the standard nextflow run nf-core/methylseq -profile test,singularity command runs just fine, I think it isn't a problem with singularity.

But methylseq isn't a DSL2 pipeline with nice containerized modules yet, so maybe you are right. It could indeed be that this error doesn't refer to your data but to a software dependency or reference data. If you check the /pipeline_info subfolder of your output directory: Are there version files for FastQC, Cutadapt and TrimGalore (v_fastqc.txt, v_cutadapt.txt, v_trim_galore.txt) present? That would corroborate that all software dependencies for this step are present in the $PATH and functional...

ADD REPLY • link 2.6 years ago by Matthias Zepper 5.0k