Question

Reporting results from fastp Read GC content

0

Entering edit mode

2.5 years ago

salman_96 ▴ 70

Hi, I have some QC results for a Paired end seq before and after filtering (Mouse samples). I am unable to interpret the plot Read GC content and I am not sure if the results are fine in this case. I am new to both fastp/MultiQC and will really appreciate your help with the results interpretation. I am mainly concerned about the blue and red irregular pattern of lines in both before and after filtering.

Average GC content over each base of all reads before filtering

fastp MultiQC PE sequencing GC • 1.9k views

ADD COMMENT • link 2.5 years ago by salman_96 ▴ 70

0

Entering edit mode

Hi,

I am sorry I forgot to add that this is mouse genome.

Regrads

ADD REPLY • link 2.5 years ago by salman_96 ▴ 70

score 2 · Answer 1 · 2022-06-09

In most of your samples, you have an evenly distributed GC content between 45 and 52%. For a human genome, you would expect approx 41%, so it is likely of different origin. Trimming removed the slight AT bias within the first 10 bases of the read, so all of this is pretty expected.

As you already noticed, the issue are the red samples and the one blue sample that show a really distinct GC pattern. This indicates a low complexity library and many reads with a similar sequence pattern for those samples. Without knowing what you sequenced, it is hard to say if this is concerning or not.

As a rule of thumb, any targeted sequencing might well result in such patterns, for example a shRNA or sgRNA library. Also, a ChIP-seq for a transcription factor having a strong motif preference could result in a pattern like this. But if that is not a targeted sequencing, the libraries likely failed and need to be redone.