Hi,
I'm wondering is there a convenient way to count quantity of bases in FASTQ
file with Phred
quality score equal or higher then 30? Thanks!
Hi,
I'm wondering is there a convenient way to count quantity of bases in FASTQ
file with Phred
quality score equal or higher then 30? Thanks!
gunzip -c input.fastq.gz | paste - - - - | cut -f 4 | fold -w 1 | awk '($1>="?")' | wc -l
gunzip -c input.fastq.gz
decompress inputpaste - - - -
linearize 4 linescut -f 4
get the quality columnfold -w 1
fold to one columnawk '($1>="?")'
select strings greater than "?" (ASCII code=63 = 33(base fastq)+30.wc -l
countBased on ASCII code decimal values. It is comparing value contained in $1
(which would be individual Q score) for being more than or equal to
to ?
.
reformat.sh from BBTools / BBMap package can also do this:
reformat.sh in=file.fq.gz qchist=file.qchist.txt
For paired end files
reformat.sh in1=file1.fq.gz in2=file2.fq.gz qchist=file.qchist.txt
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
FastQC provides a graphical quality per base position summary, and if you dig into its output files, you will find mean, median, lower quartile, upper quartile, 10th percentile and 90th percentile quality per base position.
Why do you need total count?
We often use the number of bases with quality score >=30 as a quick assay of data quality. Besides, our sequencing facilities commonly use that statistic.