Dear All,
I'm using FastQC to check the quality of my RNA-seq data, and as you knows the results consist of 11 modules (Basic statistics ,......................, Kmer Content). I need a detailed tutorial or document containing an explanation of these modules. I have looked at the one on the program website, but it is not detailed enough.
Perhaps it'd be simpler to explain what part of the output you don't understand.
Sequence Length Distribution, Sequence Duplication Levels, Overrepresented sequences, and Kmer Content,
Sequence length distribution is just the distribution of your reads. If they've been trimmed then there will be more than one length and FastQC might issue a warning, which you can ignore. You can probably ignore the duplication level, that's pretty meaningless in RNAseq. The overrepresented sequences can let you know if you have a bunch of remnant adapter contamination, but will mostly be fragments of rRNAs, which you don't care about. The Kmer content is generally not that useful in RNAseq, since you'd expect anything in an rRNA (even if you deplete things) to show enrichment.
As Michael just mentioned, don't be too concerned with warnings. The main use is to see if you've screwed up adapter/quality trimming or if you had a bubble pass through the flow-cell at some point, causing a transient decrease in quality.
Thanks a lot dpryan79.
FastQC might just scare you by showing a warning for almost each and every QC-test in RNA-seq, especially the base- and kmer- content by position put me off by showing strong bias at one end. I found some explanation on Seqanswers that this is possibly a bias in the reverse transcription or PCR process which cannot be avoided (yet) and doesn't go away by trimming (unless you chop off the ends which will just conceal the problem). So I tend to ignore those warnings unless the alignment rates decline drastically.
Thanks Michael. I think that Per base sequence quality is the important one which shows the quality of sequence.