Question

Why sequence length disturbution failed after adapter trimming: fastqc?

0

Entering edit mode

4.9 years ago

newbie ▴ 140

Dear all,

I have downloaded some already published raw data (fastqs). Initially, I did QC and found adapter content in both forward and reverse reads.

Below you can see the fastqc details before adapter trimming of both forward and reverse reads:

enter image description here

To remove the adapter content I used cutadapt like below:

cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT -o tr_sample_R1.fastq.gz -p tr_sample_R2.fastq.gz sample_R1.fastq.gz sample_R2.fastq.gz

With adapter trimming I see like below:

enter image description here

So, I have some questions:

1) Before adapter trimming, sequence length distribution was looking fine but after adapter trimming I see that something went wrong. Why is it like that?

2) I see that there is some bias in the first 10-15 bases. What I should do for that? Is it really a problem?

3) Why the GC content have multiple peaks?

Please clarify my doubts. thanks in advance.

RNA-Seq fastqc qualitycontrol adaptertrimming • 2.4k views

ADD COMMENT • link updated 4.9 years ago by Aspire ▴ 390 • written 4.9 years ago by newbie ▴ 140

score 0 · Answer 1 · 2020-09-04

0

Entering edit mode

4.9 years ago

swbarnes2 15k

I don't think any of this is a problem. You didn't really even have to trim adapters.

ADD COMMENT • link 4.9 years ago by swbarnes2 15k

score 0 · Answer 2 · 2020-09-06

You can read here https://sequencing.qcfail.com/articles/positional-sequence-bias-in-random-primed-libraries/ about the bias in the first bases.

As to the sequence length distribution, just think of what cutting adapters means... Are reads expected to be of the same length, once you cut adapters, or not?