FastQC report explanation on example
2
0
Entering edit mode
8.0 years ago

Hi there!

I have tried to make some quality control of NGS fastq files in FastQC. I've read the manual and explanation of warning and failure reasons but I do not know if my data is in summary good or bad. Probably it's bad but please take a look at this screens. Meybe someone will have some idea why the data looks that way.

enter image description here

enter image description here

enter image description here

next-gen sequencing software error • 3.5k views
ADD COMMENT
0
Entering edit mode
8.0 years ago
GenoMax 147k

This looks like NextSeq data. Having a few red "X" show up on FastQC does not indicate bad data. You should consider them "things to keep in mind" as you proceed with further analysis.

What kind of a dataset is this?

I suggest that you take a look at several blog posts by Dr. Simon Andrews at this link. They should prove useful and may answer some of your questions/doubts.

ADD COMMENT
0
Entering edit mode

Saw similar things on NextSeq, perhaps OP could try trimming the polyG tail (as those might also get high-quality scores...)

ADD REPLY
0
Entering edit mode

This is a fastq file generated by Illumina Miniseq on Truseq amplicon kit. Of course we are analysing human DNA.

ADD REPLY
2
Entering edit mode

If these are amplicons then the duplication observation (plot) is not unexpected. The strange GC plot probably can also be explained by that as well. If on-board MiniSeq analysis package has done all the analysis and things look reasonable then you can move on with other analysis.

ADD REPLY
0
Entering edit mode

I think that the MiniSeq uses the same 2-colour chemistry as NextSeq.

ADD REPLY
0
Entering edit mode
8.0 years ago
mastal511 ★ 2.1k

The Per Sequence GC content plot doesn't look very good if your data is from a single species, but it might improve after trimming if you have lots of adapter sequences in the data. It all depends what kind of experiment your data is from, as well.

ADD COMMENT
0
Entering edit mode

We are dealing with human DNA in case of Osteogenesis Imperfecta fenotype. All of fastq files are generated by Illumina Miniseq and some of bioinformatics procedures are made by Local Manager software (for example generating fastq, mapping and indexing, call variants). All of those options were deafault.

ADD REPLY
0
Entering edit mode

If these are amplicons then that might explain the GC plot, because you have many copies of some regions of the genome, rather than the whole genome.

ADD REPLY

Login before adding your answer.

Traffic: 1698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6