Question

cutadapt error while performing Trim_Galore

1

Entering edit mode

8.1 years ago

maria.traka ▴ 20

Hi, I've been trying to run trim_galore for 115 libraries and although 109 of them work perfectly for 7 of them i get the following message that Line 1 does not start with a @, which it does! I have gone back to the previous step and regenerated the file but still get the same error message. Can you help?

SUMMARISING RUN PARAMETERS
==========================
Input filename: /tgac/workarea/collaborators/traka/ESCAPE/Step2_merged/LIB27930_non_rRNA_unmerged1.fastq
Trimming mode: paired-end
Trim Galore version: 0.4.2
Cutadapt version: 1.10
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 5 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 60 bp
Running FastQC on the data once trimming has completed

Writing final adapter and quality trimmed output to LIB27930_non_rRNA_unmerged1_trimmed.fq


  >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /tgac/workarea/collaborators/traka/ESCAPE/Step2_merged/LIB27930_non_rRNA_unmerged1.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
This is cutadapt 1.10 with Python 2.7.9
Command line parameters: -f fastq -e 0.1 -q 20 -O 5 -a AGATCGGAAGAGC /tgac/workarea/collaborators/traka/ESCAPE/Step2_merged/LIB27930_non_rRNA_unmerged1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
cutadapt: error: Line 1 in FASTQ file is expected to start with '@', but found '\n'


Cutadapt terminated with exit signal: '256'.
Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...

cutadapt trim_galore • 6.2k views

ADD COMMENT • link 8.1 years ago by maria.traka ▴ 20

0

Entering edit mode

head -n 1 LIB27930_non_rRNA_unmerged1.fastq | sed -n 'l'
@HISEQ:171:CAUJ2ANXX:8:1101:1212:2104 1:N:0:TGTATCGGCCGG$

This is what i get if i look at line 1...

ADD REPLY • link 8.1 years ago by maria.traka ▴ 20

0

Entering edit mode

This is actually happening further down in the file. What's wc -l LIB27930_non_rRNA_unmerged1.fastq?

ADD REPLY • link 8.1 years ago by Devon Ryan 105k

0

Entering edit mode

I get: 175654078 LIB27930_non_rRNA_unmerged1.fastq

ADD REPLY • link 8.1 years ago by maria.traka ▴ 20

0

Entering edit mode

I think line 1 is just the file it is currently reading.... it clearly processes the first 40 million sequences. Sounds like you'll have to do some file extraction work in the shell to find the line causing the error. I would make a test file of the first 40 million sequences to see if it completes. Also, you should be able to predict from the length of the file how many lines should begin with "@"... then count out the amount of lines that actually begin with "@".

ADD REPLY • link 8.1 years ago by BioinfGuru ★ 2.1k

score 1 · Answer 1 · 2017-06-26

Update: Thanks to all your help above I now have a good idea of where the errors are. Thanks! After a bit more digging it seems that the latest sortmerna (v2.1) that generated these files has a bug that introduces errors. Until this is fixed the suggestion is to increase the memory allocation. I am now trying that and hope it will fix the problem...

score 0 · Answer 2 · 2017-06-24

0

Entering edit mode

8.1 years ago

dariober 15k

FASTQ file is expected to start with '@', but found '\n'

If this is true It seems you have an empty line in your fastq. Try to locate it with:

grep -A 10 -B 10 -n '^$' myreads.fastq

This could help finding out what happened

ADD COMMENT • link 8.1 years ago by dariober 15k