hello,
I am downloading public data, and am running FastQC on a number of FASTQ files I've downloaded. I get reports like this:
PASS Basic Statistics SRR2637682_1.fastq.bz2
PASS Per base sequence quality SRR2637682_1.fastq.bz2
PASS Per tile sequence quality SRR2637682_1.fastq.bz2
PASS Per sequence quality scores SRR2637682_1.fastq.bz2
FAIL Per base sequence content SRR2637682_1.fastq.bz2
FAIL Per sequence GC content SRR2637682_1.fastq.bz2
PASS Per base N content SRR2637682_1.fastq.bz2
PASS Sequence Length Distribution SRR2637682_1.fastq.bz2
FAIL Sequence Duplication Levels SRR2637682_1.fastq.bz2
WARN Overrepresented sequences SRR2637682_1.fastq.bz2
PASS Adapter Content SRR2637682_1.fastq.bz2
FAIL Kmer Content SRR2637682_1.fastq.bz2
I've read about lots of quality control tools that can fix some of these problems. However, I cannot find one that works properly and generates a "PASS" for all of these.
For example, I have absolutely no idea how I can fix the "Kmer content" module, all I know is that it has always shown a FAIL in every real example I've seen.
All I can find are trimmers and adapter removers, which don't improve most of the modules here. For example, "Per base sequence content" I have no idea how to fix this, all I know is that it's always FAIL.
FastQC doesn't actually fix anything, how can I go about fixing all of these modules? are there some that okay to fail?
Some "problems" are not problems. For example:
You have to take FastQC warnings and fails with a grain of salt, taking into account the nature of the samples being analysed.
P.S.: added link for post discussing TruSeq hexamer priming problem.
Nextera genomic libraries also fail the "per base sequence content", at least they did a few years back.
I believe that was because of some residual transposase bias in the first 10-15 bp.
There are a lot of posts in Biostars about Fastqc For example:
Questions regarding proprocess for raw data and usage of FastQC
What's wrong with this sample? (kmers found by FastQC of RNA-Seq)
Understanding Fastqc Output- Please Help
GC content and Kmer
etc