Fastqc error: Cannot encode phred score
0
0
Entering edit mode
4.1 years ago

I'm new to the world of RNA sequencing analysis. I'm attempting to run qc on a couple of RNA sequencing runs. These were performed on an Illumina platform (NextSeq500). The files that were returned to us were bam files.

When I attempted to run fastqc (0.11.5) I immediately got an error:

Failed to process bamfile.bam
java.lang.IllegalArgumentException: Cannot encode phred score: 239
    at net.sf.samtools.SAMUtils.phredToFastq(SAMUtils.java:369)
    at net.sf.samtools.SAMUtils.phredToFastq(SAMUtils.java:357)
    at net.sf.samtools.SAMUtils.phredToFastq(SAMUtils.java:343)
    at net.sf.samtools.SAMRecord.getBaseQualityString(SAMRecord.java:248)
    at uk.ac.babraham.FastQC.Sequence.BAMFile.readNext(BAMFile.java:144)
    at uk.ac.babraham.FastQC.Sequence.BAMFile.<init>(BAMFile.java:65)
    at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:100)
    at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:129)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:102)
    at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)

So there are some out of range phred scores. I don't know how to fix this. Any suggestions as to what I should do next would be welcomed, thank you!

RNA-Seq • 1.8k views
ADD COMMENT
0
Entering edit mode

Are these unaligned BAM files?

ADD REPLY
0
Entering edit mode

These are unaligned!**

**Actually, just kidding? I used samtools to try to get some information about the bam file:

samtools view -H L1_1_sequence_unmapped_barcodes_phix.bam
@SQ SN:phiX LN:5386
@PG ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:bwa sampe -f L1_1_sequence_unmapped_barcodes_phix.sam /home/Genomes/bwa_indexes/phiX.fa L1_1_sequence_unmapped_barcodes_phix1.sai L1_1_sequence_unmapped_barcodes_phix2.sai L1_1_sequence_unmapped_barcodes.fastq L1_2_sequence_unmapped_barcodes.fastq
@PG ID:samtools PN:samtools PP:bwa  VN:1.10 CL:samtools view -H L1_1_sequence_unmapped_barcodes_phix.bam
ADD REPLY
0
Entering edit mode

This is not an unaligned file. It seems to be aligned against phiX. So reads of interest should be unaligned. BTW: Does unmapped_barcodes in file name have any meaning?

Can you convert the BAM file back to fastq (use samtools fastq, check its options carefully) and then run FastQC on the reads.

ADD REPLY
0
Entering edit mode

Thank you. I heard back from the core and was told that these unmapped_barcodes represents the reads from the illumina control that basically don't match barcodes of my actual samples. I have two bam files, these were generated from a paired end sequencing run.

I sorted the bam files (was told they were complete) and converted them back to fastq files independently and attempted to run fastqc again on each fastq file.

I'm getting a new error this time:

Started analysis of 3180L_L1_1.sorted.fq
Approx 5% complete for 3180L_L1_1.sorted.fq
Approx 10% complete for 3180L_L1_1.sorted.fq
Approx 15% complete for 3180L_L1_1.sorted.fq
Approx 20% complete for 3180L_L1_1.sorted.fq
Approx 25% complete for 3180L_L1_1.sorted.fq
Approx 30% complete for 3180L_L1_1.sorted.fq
Approx 35% complete for 3180L_L1_1.sorted.fq
Approx 40% complete for 3180L_L1_1.sorted.fq
Approx 45% complete for 3180L_L1_1.sorted.fq
Approx 50% complete for 3180L_L1_1.sorted.fq
Approx 55% complete for 3180L_L1_1.sorted.fq
Approx 60% complete for 3180L_L1_1.sorted.fq
Approx 65% complete for 3180L_L1_1.sorted.fq
Approx 70% complete for 3180L_L1_1.sorted.fq
Approx 75% complete for 3180L_L1_1.sorted.fq
Approx 80% complete for 3180L_L1_1.sorted.fq
Approx 85% complete for 3180L_L1_1.sorted.fq
Approx 90% complete for 3180L_L1_1.sorted.fq
Approx 95% complete for 3180L_L1_1.sorted.fq
Analysis complete for 3180L_L1_1.sorted.fq
Failed to process file 3180L_L1_1.sorted.fq
java.lang.IllegalArgumentException: No known encodings with chars < 33 (Yours was )
    at uk.ac.babraham.FastQC.Sequence.QualityEncoding.PhredEncoding.getFastQEncodingOffset(PhredEncoding.java:32)
    at uk.ac.babraham.FastQC.Modules.PerBaseQualityScores.getPercentages(PerBaseQualityScores.java:71)
    at uk.ac.babraham.FastQC.Modules.PerBaseQualityScores.raisesError(PerBaseQualityScores.java:166)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:155)
    at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
    at java.lang.Thread.run(Thread.java:745)

Same error for the other fastq file.

ADD REPLY
0
Entering edit mode

There are many different types of bam files. Could you inform us about if it is sorted, aligned or sth else?

Personally, I would like to qc fastq files instead of bam because it can be manipulated if needed. So, I would recommend converting bam to fastq and then running fastqc to see the results.

ADD REPLY
0
Entering edit mode

I sorted the bam files (paired end run) and converted them to fastq and attempted to fun fastqc again.

I'm getting another error that looks related to the phred scores:

Analysis complete for 3180L_L1_1.sorted.fq
Failed to process file 3180L_L1_1.sorted.fq
java.lang.IllegalArgumentException: No known encodings with chars < 33 (Yours was )
    at uk.ac.babraham.FastQC.Sequence.QualityEncoding.PhredEncoding.getFastQEncodingOffset(PhredEncoding.java:32)
    at uk.ac.babraham.FastQC.Modules.PerBaseQualityScores.getPercentages(PerBaseQualityScores.java:71)
    at uk.ac.babraham.FastQC.Modules.PerBaseQualityScores.raisesError(PerBaseQualityScores.java:166)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:155)
    at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
    at java.lang.Thread.run(Thread.java:745)

I looked at the first few lines of the fastq file:

@NB501288_371_HGFMMBGX7:1:11101:16319:1208#GGGGGGAGATCT
GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAAATTCGACCTA
+
""""&""&&"&&&&&&&"&&&&&&&&&&"&&&&"&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&"&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&"&&&&"&&&&&&&&&&&&&&"&"&&&&&"&&""""&"
ADD REPLY
0
Entering edit mode

That is odd. These are very poor Q scores (< 5) but they are valid sanger format quality scores.

Can you run the following program from BBMap suite and post the result here. Normally you should see something like what is noted below.

$ testformat.sh in=seq.fq.gz
sanger    fastq    gz    interleaved    150bp
ADD REPLY
0
Entering edit mode

This is what I get

Exception in thread "main" java.lang.AssertionError: ASCII encoding for quality (currently ASCII-33) appears to be wrong for input quality 16 for base A at l
ines 1 and 3, position 1.  Please manually set qin=33 or qin=64.
@NB501288_371_HGFMMBGX7:1:11101:16319:1208#GGGGGGAGATCT
""""&""&&"&&&&&&&"&&&&&&&&&&"&&&&"&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&"&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&"&&&&"&&&&&&&&&&&&&&"&"&&&&&"&&""""&"
[34, 16, 34, 34, 34, 38, 34, 34, 38, 38, 34, 38, 16, 38, 38, 38, 38, 38, 38, 16, 34, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 34, 38, 38, 38, 38, 34, 38, 38, 
38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 16, 38, 38, 38, 38, 34, 38, 29, 38, 38, 38, 38, 38, 38, 3
8, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 34, 38, 38, 16, 29, 38, 38, 34, 38, 38, 38, 38
, 38, 38, 38, 38, 29, 38, 38, 38, 38, 38, 38, 34, 38, 34, 38, 38, 38, 38, 38, 16, 16, 29, 34, 38, 38, 34, 34, 34, 34, 38, 34, 29]
    at stream.FASTQ.testQuality(FASTQ.java:220)
    at fileIO.FileFormat.testInterleavedAndQuality(FileFormat.java:520)
    at fileIO.FileFormat.testInterleavedAndQuality(FileFormat.java:435)
    at fileIO.FileFormat.testFormat(FileFormat.java:370)
    at fileIO.FileFormat.<init>(FileFormat.java:219)
    at fileIO.FileFormat.testInput(FileFormat.java:161)
    at fileIO.FileFormat.testInput(FileFormat.java:143)
    at fileIO.FileFormat.test(FileFormat.java:67)
    at fileIO.FileFormat.main(FileFormat.java:59)
ADD REPLY
0
Entering edit mode

How did you convert the files to fastq? Like this?

samtools sort -n your.bam | samtools fastq -o reads.fastq -
ADD REPLY
0
Entering edit mode

I sorted the bam file like this:

samtools sort myfile.bam -o myfile.sorted.bam

I converted the resulting sorted bam using bedtools:

bedtools bamtofastq -i myfile.sorted.bam -fq myfile.sorted.fq

Thank you for your time and help!

ADD REPLY
0
Entering edit mode

You need to name sort your BAM files before converting them to fastq. Can you try my command above? It may not make a difference but worth a try. There may be some strange characters in your file. Are you using a non-english system locale?

ADD REPLY
0
Entering edit mode

I am not. I used your command above, sorting with samtools and using that to convert the resulting file to a fastq file. I have two bam files for this paired end sequencing run and I performed this sort and conversion on both files separately. I ran fastqc on them I am getting different errors for the two files:

Started analysis of 3180L_L1_1.reads.fastq
Approx 5% complete for 3180L_L1_1.reads.fastq
Approx 10% complete for 3180L_L1_1.reads.fastq
Approx 15% complete for 3180L_L1_1.reads.fastq
Approx 20% complete for 3180L_L1_1.reads.fastq
Approx 25% complete for 3180L_L1_1.reads.fastq
Approx 30% complete for 3180L_L1_1.reads.fastq
Approx 35% complete for 3180L_L1_1.reads.fastq
Approx 40% complete for 3180L_L1_1.reads.fastq
Approx 45% complete for 3180L_L1_1.reads.fastq
Approx 50% complete for 3180L_L1_1.reads.fastq
Approx 55% complete for 3180L_L1_1.reads.fastq
Approx 60% complete for 3180L_L1_1.reads.fastq
Approx 65% complete for 3180L_L1_1.reads.fastq
Approx 70% complete for 3180L_L1_1.reads.fastq
Approx 75% complete for 3180L_L1_1.reads.fastq
Approx 80% complete for 3180L_L1_1.reads.fastq
Approx 85% complete for 3180L_L1_1.reads.fastq
Approx 90% complete for 3180L_L1_1.reads.fastq
Approx 95% complete for 3180L_L1_1.reads.fastq
Analysis complete for 3180L_L1_1.reads.fastq
Failed to process file 3180L_L1_1.reads.fastq
java.lang.IllegalArgumentException: No known encodings with chars < 33 (Yours was )
    at uk.ac.babraham.FastQC.Sequence.QualityEncoding.PhredEncoding.getFastQEncodingOffset(PhredEncoding.java:32)
    at uk.ac.babraham.FastQC.Modules.PerBaseQualityScores.getPercentages(PerBaseQualityScores.java:71)
    at uk.ac.babraham.FastQC.Modules.PerBaseQualityScores.raisesError(PerBaseQualityScores.java:166)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
    at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:84)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:155)
    at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
    at java.lang.Thread.run(Thread.java:745)
Failed to process 3203L_L1_1.reads.fastq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
    at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:158)
    at uk.ac.babraham.FastQC.Sequence.FastQFile.<init>(FastQFile.java:89)
    at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:106)
    at uk.ac.babraham.FastQC.Sequence.SequenceFactory.getSequenceFile(SequenceFactory.java:62)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:129)
    at uk.ac.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:102)
    at uk.ac.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:316)
ADD REPLY
0
Entering edit mode

At this point you can try a fastq validation program like fq lint (LINK).

There is also validateFiles from Jim Kent (C: Viewing and editing FASTQ files )

Both programs should identify those records which are corrupt. You will likely need to delete those. Curiously this error must be present in your BAM files since it is getting carried forward. You can check on that.

ADD REPLY
0
Entering edit mode

Thank you for all your help. I'll try this.

ADD REPLY

Login before adding your answer.

Traffic: 2149 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6