I have several large FASTQ files I obtained from a human tumor sequencing project. I know that I need to run quality control first and I need to get the phred score to do that. How do I get a phred score for each fastQ file? What sort of software/programming do I need? I know galaxy does it but I cannot run galaxy on our server as the files are way too huge and we cannot do cloud computing from anywhere.
I am completely new to writing script, comp. sci, etc. as I'm mainly a molecular biologist so if someone could point me out to a useful resource that'd be awesome.
prinseq is good http://edwards.sdsu.edu/prinseq/
fastqc in sequencher software ( trial period)
Phred scores are assigned to each base in a NGS read. There is no such thing as average phred score for a read or for a set of sequences in a file.
What OS do you have access to?
If there is no such thing as an average phred score, how does one determine if the quality of the reads is good or not? Is there some type of cut off value that people use?
Im using OS X 10.8.5
You should be able to use FastQC (it is a Java program) as mentioned by @Harold below. There are several useful blog posts by Dr. Simon Andrews (author of FastQC) here. I suggest you do some reading there. Look at a few posts tagged FastQC on Biostars. Then run your data through FastQC and come back with questions, as/if they arise.
In general, if you are going to re-align the data to a well known reference then you may be able to use bases with Q scores as low as 10-15. For de novo work you would want to be more stringent (Q25 and above).