Hi
I saw the video of fastqc under videos section on biostar.
I have a question.
Why is it that often i find in my first 12-13 bases per base sequence content and **per base gc content are quite wavy even though per base sequence quality is very good. What can be done to fix them.
With RNA-seq, this can happen due to biases in random hexamer priming during the RT step (explaining the first 6 bases) possibly combined with sequence specificity of the polymerase itself and/or artifacts from end repair (possibly explaining out to 13 bases).
I think the assumption is that for standard differential expression, any sequence bias in a gene is the same between samples so it's not a problem. However it is a problem for estimating expression in a single sample (i.e. FPKM), since transcripts compared to each other may have different biases.
Have you checked whether those first few bases don't belong to any adaptor/barcode sequence ? Normally those sequences if left untrimmed may result into what you have mentioned above. I may be completely wrong but try to go through the FastQC report and if those sequences show up in Over-represented sequences section then you need to trim them off.
The origin of the sample also matters. If the sample preparation isolates certain parts of a genome, for example a CHip-Seq experiment we could expect that to be reflected in the sequence content of the reads.
can you post a plot or the numerical values?
(+1) definitely helps to see the fastQC plot.
I added the plots. Have a look
i added the plot have a look
Can you also tell if this is RNA-Seq data?
The data is RNA-Seq