Question

Per Base Sequence Content - continuous raise of G%

0

Entering edit mode

7 months ago

pl.terzian • 0

I am new to quality control steps so I have troubles interpreting the results of fastqc results on my cut&run dataset.

These are two "Per Base Sequence Content" from two libraries of the same sample.

first library

The above plot doesn't seem to show non expected biases as I understand it should be related to tagmentation method or Illumina adaptater and should be removed during alignment.

Following is the plot below showing an increase of G% along the reads. Both library are "failing" according to fastq but I red that most biases should not prevent you from starting downstream analysis.

So I wanted to know if anyone can help me understand what is happening on the second plot to help me taking the decision to keep or remove reads from this library for downstream analysis.

Many thanks !

Second library

fastqc • 735 views

ADD COMMENT • link updated 7 months ago by GenoMax 147k • written 7 months ago by pl.terzian • 0

1

Entering edit mode

This is only a speculation but assuming this is done with two color chemistry, it is possible that you have a fraction of library with short inserts. With those once you run through the adapter on the other end the sequencing may be simply generating "G = No Signal = No calls". You can trim your data for stretches of poly-G's.

ADD REPLY • link 7 months ago by GenoMax 147k

0

Entering edit mode

Thanks a bunch, I actually have poly-G's sequences as the top overrepresented sequence but I did not know what it meant.

ADD REPLY • link 7 months ago by pl.terzian • 0

0

Entering edit mode

GenoMax Actually I was misleaded by the tracks names and my sample_sheet file. The two plot are forward and reverse reads (R1 & R2) of the same library. It appear R1 have 2M+ reads with large 'GGGGGGGGGGGGGGGGGGGGGGGGG' insert while R2 has only 200k of these poly-G's sequences. Is this a behavior you ever saw ?

ADD REPLY • link 7 months ago by pl.terzian • 0

1

Entering edit mode

Pure poly-G reads are not usable. You should remove them or they will get dropped in subsequent analysis.

ADD REPLY • link 7 months ago by GenoMax 147k