I read everywhere that you can't just fully rely on the fastqc report while looking at your reads and that this depends on the experiment you are working with. Well in my case the reads were generated using IlluminaSeq. What do I expect as normal when looking at the fastqc.html?
it says that "Lower Q scores can result in a significant portion of the reads being unusable. They may also lead to increased false-positive variant calls, resulting in inaccurate conclusions."
But my professor told me that the ranges shouldn't be like that?
If it is a matter of visually having everything above a certain Q score then trim your data using that score cutoff. Real life data can be worse looking than this as far as Q scores go and still works fine.
The general trend you see on that plot--lower quality scores on the read ends, higher quality in the middle of the read--is pretty typical. Your average quality scores look decent, but the variance in quality score near the end of the read is pretty high. I don't usually see the the interquartile range of the quality scores dip so low these days.
I would definitely recommend doing some read trimming/filtering and re-assessing the QC metrics after this step. A tool like fastp can give you good "before and after" QC results.
Well in my case the reads were generated using IlluminaSeq. What do I
expect as normal when looking at the fastqc.html?
Well, for short reads like those you used here (~75nt), I would expect better quality in the 3p end (~28 or higher) because it is what you expect when everything runs smoothly:
RNA quality (RIN>=8, 260/280 ~2, etc)
Concentration
library prep controls
Sequencing performance
Nonetheless, if the QC doesn't look ok to you, you can always trim/filter your sequences.
They may also lead to increased false-positive variant calls,
resulting in inaccurate conclusions.
Not necessarily, it depends on many variables, including the coverage.
it says that it passed. But my professor told me that the ranges shouldn't be like that? Also if you go to: https://www.illumina.com/science/technology/next-generation-sequencing/plan-experiments/quality-scores.html
it says that "Lower Q scores can result in a significant portion of the reads being unusable. They may also lead to increased false-positive variant calls, resulting in inaccurate conclusions."
You asked this question is a different way a few days back: FASTQC.html: Quality control of reads.
If it is a matter of visually having everything above a certain Q score then trim your data using that score cutoff. Real life data can be worse looking than this as far as Q scores go and still works fine.
perfect, thanks!
What kind of data is this? RNAseq?
Is it an 'old' dataset or something that has been generated recently?
Yes it is RNA seq data. (Illumina)
The general trend you see on that plot--lower quality scores on the read ends, higher quality in the middle of the read--is pretty typical. Your average quality scores look decent, but the variance in quality score near the end of the read is pretty high. I don't usually see the the interquartile range of the quality scores dip so low these days.
I would definitely recommend doing some read trimming/filtering and re-assessing the QC metrics after this step. A tool like fastp can give you good "before and after" QC results.