Question

FastQC report: should we not use the fastq data for Tophat+Cuffdiff if any of its modules is marked as "red cross"?

0

Entering edit mode

8.8 years ago

tunl ▴ 90

Some of our FastQC reports have a module marked as “red cross” (interpreted as “very unusual”), for example, “Kmer Content”.

We need to run Tophat+Cuffdiff on those fastq data, so I am wondering whether we should not use the fastq data to run Tophat+Cuffdiff if any of its modules is marked as “red cross”?

As for “Kmer Content”, I am wondering how important this module is?

I read the online document on “Kmer Content” (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/3%20Analysis%20Modules/11%20Kmer%20Content.html ), but did not seem to find its significance level.

Is the order of the modules listed the order of their significance (are the top ones more significant than the bottom ones)?

Also, I wonder what tools may be used to fix the problems reported by FastQC?

Any ideas and advice would be greatly appreciated!

Thank you very much!

RNA-Seq FastQC Tophat Cuffdiff • 2.5k views

ADD COMMENT • link 8.8 years ago by tunl ▴ 90

score 1 · Answer 1 · 2016-07-22

The red X's come from some interval decisions that Simon had to make when designing the software. These are configurable (there is a file you can edit, fastqc-0.11.3/FastQC/Configuration/limits.txt). Having an X show up does not automatically disqualify a dataset.

For example in an experiment where you expect enrichment of some sequence (which may lead to high duplication etc) you would want to see a red X. So use the FastQC results as a guide for deciding how to handle your datasets that point on and not as a hard pass/fail decision.

Dr. Simon Andrews has several informative blog posts (including FastQC observations) at this new site.