Bad Per sequence GC content
1
0
Entering edit mode
3.2 years ago

Hello, Biostars!

I have two fastq files of pair-end reads, which I want to use for SNV calling. Quality checking in FastQC showed bad Per base sequence content and a couple of warnings in both Per sequence GC content and Sequence Length Distribution - you can see it in the pictures below. Per base sequence content before trimming GC content before trimming Sequence Length Distribution

My idea was to cut off first 6 bases and around 10 in the end. I used Trimmomatic with the following command:

TrimmomaticPE -threads 32 -phred33 R1.fastq R2.fastq Trimmed/FP.fastq Trimmed/FUN.fastq Trimmed/RP.fastq Trimmed/RUN.fastq ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10 HEADCROP:6 SLIDINGWINDOW:4:30 CROP:90

After this I got a pretty strange GC content, which appears to be worse than it used to be before trimming, and the Sequence Length Distribution is still has a warning.

Bad Per sequence GC content enter image description here

The basic statistics before and after trimming is the following: Statistics before trimming Statistics after trimming

Does anyone have any idea why this happened, and what to do to improve the quality of data? Any help is appreciated!

content reads fastqc checking quality Trimmomatic GC pair-end • 2.1k views
ADD COMMENT
3
Entering edit mode
3.2 years ago

This data is fine. FastQC often fails for these metrics because it expects a nearly perfect uniform distribution, and biology doesn't agree with that. As for the base skew, it is likely due to adapter trimming. Again, not worth worrying about - any decent aligner will softclip any sequence that don't align, so they'll be ignored for variant calling anyway.

Continue with your analysis. If things look wonky downstream, than you can come back and see if something more nefarious is occurring, though I doubt that's the case.

ADD COMMENT

Login before adding your answer.

Traffic: 2686 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6