Hi all, I am hoping to get some help solving a vexing issue I have been having running Trimmomatic for SE reads.
Here is the command input I am using to call the program:
java -jar /usr/share/java/trimmomatic-0.36.jar SE -threads 1 -phred33 ../data/ICCRIL07_0240.fq.gz ../results/trimmed_seqs/ICCRIL07_0240_trimmed.gz ILLUMINACLIP:./TruSeq3-SE.fa:2:30:6 HEADCROP:20 SLIDINGWINDOW:4:15 MINLEN:36
and here is the resulting output:
TrimmomaticSE: Started with arguments:
-threads 1 -phred33 ../data/ICCRIL07_0240.fq.gz ../results/trimmed_seqs/ICCRIL07_0240_trimmed.gz ILLUMINACLIP:./TruSeq3-SE.fa:2:30:6 HEADCROP:20 SLIDINGWINDOW:4:15 MINLEN:36
Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Exception in thread "main" java.lang.RuntimeException: Invalid FASTQ comment line: TCGCATCGTTCGATCAGCATTTCGAGTAACTCCTCAACCTGGAGTCCCGCCTGAAGAAGCAGGTGCTGAGATCGGAAGAGC
at org.usadellab.trimmomatic.fastq.FastqParser.parseOne(FastqParser.java:82)
at org.usadellab.trimmomatic.fastq.FastqParser.next(FastqParser.java:179)
at org.usadellab.trimmomatic.TrimmomaticSE.processSingleThreaded(TrimmomaticSE.java:60)
at org.usadellab.trimmomatic.TrimmomaticSE.process(TrimmomaticSE.java:222)
at org.usadellab.trimmomatic.TrimmomaticSE.run(TrimmomaticSE.java:306)
at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:85)
From what I can tell, it seems that the program may think the input may be a fasta file? I'm not really sure though, as the input file is definitely in fastq format, and as far as I can tell the data is appropriately formatted. Would it be possible that somehow one of the reads was improperly formatted? And if so, is there an easy way to search it out?
Not sure if this helps, but when I run with two threads (i.e. -threads 2), the above error still occurs, shutting down one of the threads, but it appears the process still completes (though I'm not sure I can trust those results?)
I'm at sixes and sevens here, and any help would be appreciated.
It looks like there is a mismatch between the fastq "format" that the program expects and the input. Could it be that your input breaks the 4-lines per record "specification" and instead contains line breaks in the sequences? For a quick check, what is the output of
EDIT2: Seems I'm currently in need a coffee; re-looked at the results of the grep search...
I don't see any entries that break the four line format...any other ideas?
seqtk may be able to help you. It can convert multi-line FASTQ to 4-line FASTQ.