Number of bases with a certain quality in FASTQ file
2
0
Entering edit mode
6.4 years ago
Denis ▴ 310

Hi, I'm wondering is there a convenient way to count quantity of bases in FASTQ file with Phred quality score equal or higher then 30? Thanks!

next-gen sequencing • 3.5k views
ADD COMMENT
0
Entering edit mode

FastQC provides a graphical quality per base position summary, and if you dig into its output files, you will find mean, median, lower quartile, upper quartile, 10th percentile and 90th percentile quality per base position.

Why do you need total count?

ADD REPLY
0
Entering edit mode

We often use the number of bases with quality score >=30 as a quick assay of data quality. Besides, our sequencing facilities commonly use that statistic.

ADD REPLY
5
Entering edit mode
6.4 years ago
 gunzip -c  input.fastq.gz | paste - - - - | cut -f 4 | fold -w 1 | awk '($1>="?")' | wc -l
  • gunzip -c input.fastq.gz decompress input
  • paste - - - - linearize 4 lines
  • cut -f 4 get the quality column
  • fold -w 1 fold to one column
  • awk '($1>="?")' select strings greater than "?" (ASCII code=63 = 33(base fastq)+30.
  • wc -l count
ADD COMMENT
0
Entering edit mode

Thanks! But from where awk may know that for example "=" is less than "?" ? How can it compare just two characters?

ADD REPLY
1
Entering edit mode

Based on ASCII code decimal values. It is comparing value contained in $1 (which would be individual Q score) for being more than or equal to to ?.

ADD REPLY
0
Entering edit mode

Hi genomax! Thanks a lot! Now it's clear.

ADD REPLY
0
Entering edit mode

Should i add something like this tr -d '\n' somewhere in the pipe to remove new line characters? echo \n | awk '($0>="?")' results in n

ADD REPLY
1
Entering edit mode
6.4 years ago
h.mon 35k

reformat.sh from BBTools / BBMap package can also do this:

reformat.sh in=file.fq.gz qchist=file.qchist.txt

For paired end files

reformat.sh in1=file1.fq.gz in2=file2.fq.gz qchist=file.qchist.txt
ADD COMMENT

Login before adding your answer.

Traffic: 1959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6