Fastq Quality Check
3
5
Entering edit mode
13.1 years ago
toshnam ▴ 650

Hi all,

I'm trying to check sequencing quality of FASTQ file from HiSeq2000. I used fastx_quality_stats script of FASTX-Toolkit (Version 0.0.13) for it. However I've got an error as follows:

$ fastx_quality_stats -i 6_1.fastq -o 6_1.stats <br />
fastx_quality_stats: Invalid quality score value (char '#' ord 35 quality value -29) on line 4

The FASTQ file really contains "#" character.

@HWI-ST621:210:C03D4ACXX:4:1101:1475:1957 1:N:0:ATCACG
NACTACAATTTACAGATAACTTTAAATTAAATTTTGGAATCAAATATAAAGATTGAAAATGAATTTTGAATATATGAAAATCCATTTAAAGAGTTTGGTAC
+
#1=DDDFFHHDHHIIIJJEHIJJJJJIIIJFIGGJJJFICGIGGGIIJIEIIIIJIJIIIIHIIIJIGGIJIIIJGHIEHJJJHHHHHHHFFF;B@CA;;@

"#" charater is invalid quality score value? I heard this FASTQ file was checked using quality trim program of NGS Cell package of CLCBio, and sequencing quality was good. Then, "#" character is invalid for FASTX-Toolkit only?

I also used Popoolation toolbox (Version 1.2.2) for quality trimming of the FASTQ, and I've got some results as follows:

$trim-fastq.pl --input1 6_1.fastq --input2 6_2.fastq --output trimmed

......................................................

FINISHED: end statistics
Read-pairs processed: 53675033
Read-pairs trimmed in pairs: 0
Read-pairs trimmed as singles: 0


FIRST READ STATISTICS
First reads passing: 0
5p poly-N sequences trimmed: 632578
3p poly-N sequences trimmed: 0
Reads discarded during 'remaining N filtering': 0
Reads discarded during length filtering: 53675033
Count sequences trimed during quality filtering: 53675033

Read length distribution first read
length  count


SECOND READ STATISTICS
Second reads passing: 0
5p poly-N sequences trimmed: 628623
3p poly-N sequences trimmed: 801
Reads discarded during 'remaining N filtering': 0
Reads discarded during length filtering: 53675033
Count sequences trimed during quality filtering: 53675033

Read length distribution second read
length  count

As you see, all of reads were trimmed during the process of quality trimming.
I've been working with some GAII and HiSeq2000 sequence data, but this is the first case. I wonder whether this problem was caused by bad sequencing quality or my mistake.

I appreciate any help.
Thanks.

fastq fastx • 21k views
ADD COMMENT
2
Entering edit mode

Solution 1. Use an alternative program such as FastQC. Solution 2. Use -Q33 option on Fastx-Toolkit. Thanks, guys :-)

ADD REPLY
1
Entering edit mode

Solution 1. Use an alternative program such as FastQC. Solution 2. Use -Q33 option on Fastx-Toolkit.

ADD REPLY
7
Entering edit mode
13.1 years ago
Rm 8.3k

Try adding -Q33 option to fastx command and run...

fastx_quality_stats -Q33 i 6_1.fastq -o 6_1.stats
ADD COMMENT
4
Entering edit mode
13.1 years ago
toni ★ 2.2k

It seems to be a problem of quality encoding in your file.

Apparently (35-64 = -29) fastx toolkit suppose that your file is in Illumina 1.3+ encoding, whereas your file seems to be in Sanger encoding which has an offset of 33 instead of 64.

Read this for further information on quality scores encoding :

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/

There may exist options in fastx-toolkit to handle this.

ADD COMMENT
0
Entering edit mode

Thank you for your comment. I know the latest fastx-toolkit can read both fastq type, sanger and solexa, basically (Please refer to update news on fastx-toolkit homepage). Also, I confirmed manual of fastx_quality_stats and couldn't find any option for this problem.

ADD REPLY
2
Entering edit mode
13.1 years ago
pmenzel ▴ 310

Yes, fastx toolkit doesn't work with the quality scores of some versions of the Illumina software.

ADD COMMENT
4
Entering edit mode

fastx toolkit can use other quality scores, it isn't documented, but with e.g. -Q33 one can use Sanger encoded data.

ADD REPLY
2
Entering edit mode

Check fastQC which is good and guess the encoding internally. http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

ADD REPLY
2
Entering edit mode
ADD REPLY
0
Entering edit mode

Really? Can you recommend any alternative free program to check sequencing quality of my fastq?

ADD REPLY
0
Entering edit mode

Thanks, tony and daniel. FastQC is working well with my FASTQ file.

ADD REPLY
0
Entering edit mode

Thanks, Jan. I confirmed "-Q33" option is working well with my FASTQ file.

ADD REPLY
0
Entering edit mode

+1 for fastqc, love it.

ADD REPLY
0
Entering edit mode

thanks Jan, didn't know that too.

ADD REPLY
0
Entering edit mode

I also like a lot SolexaQA http://solexaqa.sourceforge.net/

ADD REPLY
0
Entering edit mode

Note to commenters: Try to avoid using the comments as a place to answer the question. In this case the answer is what Jan van Haars mentions, that one needs to to pass the option -Q33 to the tool. Comments are for asking clarifications.

ADD REPLY

Login before adding your answer.

Traffic: 2007 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6