Question

fastqc Exception in thread "Thread-1" (error)

0

Entering edit mode

9.7 years ago

Assa Yeroslaviz ★ 1.9k

Hi,

I am running the fastqc (v0.11.4) on Ubuntu 14.04.3 LTS.
I have four fastq files (two pairs of paired-end reads samples). They are AFAIK from old solexa machines in sanger format
somehow when I am trying to fastq the _1 files I get the following error message:

fastqc -t 12 -o ../Results/1c3c603f-29ac-4263-851d-b19f9ce4cfb0/fastqcResults/ 61627AAXX_1_1.fastq.gz
Started analysis of 61627AAXX_1_1.fastq.gz
Exception in thread "Thread-1" java.lang.IllegalArgumentException: Unexpected cs char C
        at uk.ac.babraham.FastQC.Sequence.FastQFile.convertColorspaceToBases(FastQFile.java:334)
        at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:191)
        at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
        at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
        at java.lang.Thread.run(Thread.java:745)

than nothing happens.
When I am using the same command for the _2 files, it works fine.

fastqc -t 12 -o ../Results/1c3c603f-29ac-4263-851d-b19f9ce4cfb0/fastqcResults/ 61627AAXX_1_2.fastq.gz
Started analysis of 61627AAXX_1_2.fastq.gz
Approx 5% complete for 61627AAXX_1_2.fastq.gz
...

I can't see any differences in the format of the two files.

the header of the two files from one of the pairs looks like that:

zcat 61GAFAAXX_1_1.fastq.gz | head -n 12
@SOLEXA12_1:1:1:990:4777/1 1:Y:0:0
..................................................
+
##################################################
@SOLEXA12_1:1:1:990:11674/1 1:Y:0:0
..................................................
+
##################################################
@SOLEXA12_1:1:1:990:17662/1 1:Y:0:0
..................................................
+
##################################################

and

zcat 61GAFAAXX_1_2.fastq.gz | head -n 12
@SOLEXA12_1:1:1:990:4777/2 2:Y:0:0
..................................................
+
##################################################
@SOLEXA12_1:1:1:990:11674/2 2:Y:0:0
..................................................
+
##################################################
@SOLEXA12_1:1:1:990:17662/2 2:Y:0:0
..................................................
+
##################################################

Any ideas, why I can't run the _1 files?

thanks
Assa

P.S.
When I am running the SolexaQA++ tool, I can read all the four files without difficulties.

fastq fastqc • 7.6k views

ADD COMMENT • link 9.7 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

Are the lines that are just "..................." actually there or did you just censor the sequence? If those are actually there then the fastq files aren't valid and I'm not surprised that fastqc is complaining.

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.7 years ago by Devon Ryan 105k

0

Entering edit mode

yes, they are really there and no, fastqc complains only in the two files with the _1 part of the paired-end files, the _2 partners are running without a problem. For that reason I don't think that it is the "." in the sequence.

I have taken multiple subsets of the data and pinpointed the region to the sequence in rows 841-844. If I take only the first 840 rows, I can run fastqc, but if I add the next four lines, it gives me the error message.

Unfortunately I can't see any differences in these four rows to the rest of the data.

ADD REPLY • link 9.7 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

It's assuming that you have colorspace data since that's the only place "." is valid. However, it looks like you instead have non colorspace data (i.e., you probably have normal data), which is what's confusing it and causing the errors. Where did these files come from (i.e., what type of machine and when)?

ADD REPLY • link 9.7 years ago by Devon Ryan 105k

0

Entering edit mode

These are Illumina (Solexa) reads from cancer patients 50bp long reads paired-end sequencing.

But this doesn't explain, why the _2 files can be read with fastqc and the _1 file can't.

ADD REPLY • link updated 5.7 years ago by Ram 45k • written 9.7 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

Are the fastq-fragments you posted above from those suspicious lines? With the given fastq lines, fastqc does not complain in my case.

Edit: Sorry, I have overlooked the 'head' command. May you paste the lines 836 to e.g., 852?

ADD REPLY • link 9.7 years ago by dschika ▴ 320

0

Entering edit mode

these are rows 837-852:

@SOLEXA9_1:1:1:1072:8268/1 1:Y:0:0
A.................................................
+
##################################################
@SOLEXA9_1:1:1:1072:10325/1 1:Y:0:0
GC................................................
+
##################################################
@SOLEXA9_1:1:1:1072:14294/1 1:Y:0:0
CT................................................
+
##################################################
@SOLEXA9_1:1:1:1073:8096/1 1:Y:0:0
TG................................................
+
##################################################

But I can't see any differences.

I have uploaded the first 852 rows of my file to here. Maybe someone can test it and see if it runs on their machine.

thanks

ADD REPLY • link 9.7 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

The "." is valid in a csfasta file but not a normal fastq file. Likewise, a sequence starting with GC is invalid in a csfastq file (you can have one and only one base at the beginning AFAIK...and it's usually T from what I remember). My only guess is that this was originally color space data and someone tried to convert it to base space at some point.

Edit: This also explains why you got the "unexpected cs char C" message, since "cs" means "color space".

ADD REPLY • link 9.7 years ago by Devon Ryan 105k