very confused with GC bias
4
0
Entering edit mode
5.5 years ago
9521ljh ▴ 50

I have fastq files and find Per sequence GC content is not well shaped. Therefore I think it is contaminated.

enter image description here

But is this failure of GC content means GC bias?

Because I think GC bias is related with coverage and depth of read (after mapping problem)

but above picture is not mapped, just fastq file.

am i right think that GC content is difference with GC bias?

fastqc • 7.0k views
ADD COMMENT
1
Entering edit mode

Hi, Your result seems find your distibution is closer with the theorical :)

ADD REPLY
0
Entering edit mode

is this from raw sample data or did you process it already?

ADD REPLY
0
Entering edit mode

it is raw fastq file.

ADD REPLY
5
Entering edit mode
5.5 years ago

But is this Failure of GC content means GC bias?

What you see is that you have more reads with a GC content of greater than 50% than what FastQC would expect given a normal distribution based on the mode of your reads' GC content. This may be indicative of GC bias, but it doesn't have to be, especially if you're not too interested in quantiative measures down the road. Keep calm and carry on and just keep this in the back of your mind before drawing strong conclusions, e.g. about interesting enrichments seen for regions with 50-60% GC content.

because i think GC bias is related with coverage and depth of read(after mapping problem)

The GC content of each read can be determined irrespective of its location in the genome; after all, you only need to tally the types of bases you've sequenced, which is exactly the type of information that's stored in a fastq file.

But you are right insofar as that FastQC's assumption about what a uniform sampling of your organism's genome should look like might be incorrect.

am i right think that GC content is difference with GC bias?

GC content simply describes the numbers of G's and C's that you sequence in relation to the numbers of A's and T's. GC bias is typically used to describe the fact that the enzymes and conditions used for PCR amplification tend to more efficiently amplify reads with modest to medium-high GC content. There will always be some sort of GC bias in Illumina-based sequencing (the reference by Terry Speed and Benjamin Hochberg that Ranan pointed to is an enlightening read in that regard); it mostly becomes an issue if you are trying to compare the read numbers of different samples where one sample (type) had only mild GC bias while the other one shows dramatic GC bias.

ADD COMMENT
3
Entering edit mode
5.5 years ago

this is nothing to worry about. It simply shows the GC content of your read data. I would not say it deviates severely from the expected curve. It could be perhaps be due to the organisms you work on. Moreover, FastQC is very strict on its evaluation.

Here is an interesting link about all this: QCfail

What I am a little surprised about is that you all have green checks in the overview, I've seen this only very rarely :/

ADD COMMENT
1
Entering edit mode
ADD COMMENT
0
Entering edit mode
5.5 years ago
chen ★ 2.5k

You should take a look at the GC content curves.

The fastp tool mahy help, see: https://github.com/OpenGene/fastp

ADD COMMENT
0
Entering edit mode

Some sequencers, like Illumina NovaSeq may have polyG in end of reads, which may affect GC curve. Use fastp to trim polyG and check the post-filtering data.

ADD REPLY
0
Entering edit mode

Hi, I know this comment was left long ago, but I wanted to ask why there will be polyG in the end of reads? I thought this would be due to no signal reads.

ADD REPLY

Login before adding your answer.

Traffic: 1936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6