Hi, I've been trying to run trim_galore for 115 libraries and although 109 of them work perfectly for 7 of them i get the following message that Line 1 does not start with a @, which it does! I have gone back to the previous step and regenerated the file but still get the same error message. Can you help?
SUMMARISING RUN PARAMETERS
==========================
Input filename: /tgac/workarea/collaborators/traka/ESCAPE/Step2_merged/LIB27930_non_rRNA_unmerged1.fastq
Trimming mode: paired-end
Trim Galore version: 0.4.2
Cutadapt version: 1.10
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 5 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 60 bp
Running FastQC on the data once trimming has completed
Writing final adapter and quality trimmed output to LIB27930_non_rRNA_unmerged1_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /tgac/workarea/collaborators/traka/ESCAPE/Step2_merged/LIB27930_non_rRNA_unmerged1.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
40000000 sequences processed
This is cutadapt 1.10 with Python 2.7.9
Command line parameters: -f fastq -e 0.1 -q 20 -O 5 -a AGATCGGAAGAGC /tgac/workarea/collaborators/traka/ESCAPE/Step2_merged/LIB27930_non_rRNA_unmerged1.fastq
Trimming 1 adapter with at most 10.0% errors in single-end mode ...
cutadapt: error: Line 1 in FASTQ file is expected to start with '@', but found '\n'
Cutadapt terminated with exit signal: '256'.
Terminating Trim Galore run, please check error message(s) to get an idea what went wrong...
This is what i get if i look at line 1...
This is actually happening further down in the file. What's
wc -l LIB27930_non_rRNA_unmerged1.fastq
?I get: 175654078 LIB27930_non_rRNA_unmerged1.fastq
I think line 1 is just the file it is currently reading.... it clearly processes the first 40 million sequences. Sounds like you'll have to do some file extraction work in the shell to find the line causing the error. I would make a test file of the first 40 million sequences to see if it completes. Also, you should be able to predict from the length of the file how many lines should begin with "@"... then count out the amount of lines that actually begin with "@".