Hi,
I have some QC results for a Paired end seq before and after filtering (Mouse samples). I am unable to interpret the plot Read GC content and I am not sure if the results are fine in this case. I am new to both fastp/MultiQC and will really appreciate your help with the results interpretation. I am mainly concerned about the blue and red irregular pattern of lines in both before and after filtering.
In most of your samples, you have an evenly distributed GC content between 45 and 52%. For a human genome, you would expect approx 41%, so it is likely of different origin. Trimming removed the slight AT bias within the first 10 bases of the read, so all of this is pretty expected.
As you already noticed, the issue are the red samples and the one blue sample that show a really distinct GC pattern. This indicates a low complexity library and many reads with a similar sequence pattern for those samples. Without knowing what you sequenced, it is hard to say if this is concerning or not.
As a rule of thumb, any targeted sequencing might well result in such patterns, for example a shRNA or sgRNA library. Also, a ChIP-seq for a transcription factor having a strong motif preference could result in a pattern like this. But if that is not a targeted sequencing, the libraries likely failed and need to be redone.
Hi, Thank you for the detailed information. I have located those two samples that were noisy(The blue and red one) and plan to exclude them before running further downstream analysis. I have sequencing results from mouse organoids (paired end fastq files) that I am analyzing.
Hi,
I am sorry I forgot to add that this is mouse genome.
Regrads