Hello there I want to determine %>q20 , Q30, Q40 of some fastq files produced by illumina hiseq 2000. Do you have any idea?
Thanks
Hello there I want to determine %>q20 , Q30, Q40 of some fastq files produced by illumina hiseq 2000. Do you have any idea?
Thanks
Use Fastqc. It is a very popular program easily found with a single search. It will give you the mean, median, quartiles and statistics of every position
Thanks Antonio, but I want to have the %>q30 for all reads and positions. The sequencing company sent me a report that say Q20=98% I want to calculate it again by myself. also calculate q30 and q40. By the way, I want to make my Q20 of my data near to 100. any idea?
I just want to evaluate all reads with one score. I sequenced by Illumina hiseq 2000. I am interested to know what percent of my data has more score than Q20, what percent more than q30 and so on. I have done fastqc but I am curios to find out q20,... of my data. Is there any tool or script?
One should be careful about "average quality score" for an entire file since you could still have a subset of sequences that may be hidden outliers ("bad") in an otherwise "good" file. Take a look to this WEB PAGE to understand
If still are interested, the "Compute quality statistics" function in Galaxy (e.g. in https://main.g2.bx.psu.edu) will do it, and maybe one of the utilities provided by the fastx-toolkit
Thanks again Antonio, You are right, and I agree that average quality score is not good enough. I have done fastQC for my data and have quality for each position.Everything is ok with that. I wanted an overall evaluation of my data out of curiosity. Finally, I found some tools for this purpose. "FaQCs" is one of them that works fine.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Try @Matt Shirley's fastqp tool (https://github.com/mdshw5/fastqp).
+1 for fastqp! :) This tool looks awesome!
I wish @Matt would convert the output (or at least make an option available) to create a standalone html file like FastQC.
Hahah, you just described the reason I made my QC tool SeQC :P (which is for BAM files so not really useful here) FastQC frustrates me because getting the data out means parsing some html. I personally prefer tools like Matt's because the plotting is left up to me. However, some basic plotting functionality should be built-in to speed things up. Once I finish my PhD i'll finish up SeQC and hopefully import Matt's stats as modules so we get the best of both worlds. It's taken me so long to write SeQC that the name has now been taken by 2 or 3 other projects, so perhaps i'll have to rename it to SlowQC or something.
Hate to hijack this thread but since we are on the topic: In test I ran @Matt's program just produced a bunch of png files. So that is the default output. I have not gone back to explore other options yet. It also does not seem to pickup sample names from fastq files (like FastQC) to label the output zip folder automatically.
FastQC provides best of both worlds. Which would be nice to get from fastqp.
For some who want/like the portability there is the html single file report. For users like you there is a zip archive with all underlying data (in text) for re-plotting/whatever.
Oh awesome! :) I knew there was a zip file, but that used to extract to a directory for the browser to navigate. I pleaded with I think Simon to make the html a single file so it would work better with a web-based logging program I was working on (that renames all files to their MD5 sum, and doesn't support directories), and after that change took place I just assumed the zip file just contained only the new single html file (but compressed). I never looked, hehe - time to check it out! Thanks :)
Tagging @Matt so he sees this thread: Matt Shirley