Question

High G and low A,C,T content in the 1-10bp of Read2 file in a paired-end whole genome bisulfite sequencing, why?

0

Entering edit mode

5.3 years ago

Floyd • 0

Hi there!

I'm new to bisulfite sequencing. Recently received my paired-end WGBS raw file. The Read1 looks OK after fastQC, but the Read2 file has a waired 'Per base sequence content' figure. Why is there a such G content. It is supposed to be very low G reading due to C-T conversion in Read1 file.

Read1 figure

Read2 figure

The Q value of each file is high (>30). They all passed fastQC adapter content test.

Fq file starts with @AXXX..I guess it was sequenced on Novaseq. Through it is known to have false G signal , those G show up in the tail instead of head.

What could cause this problem? Is it wise to trim the 1-15bp in Read2?

sequencing WGBS Paired-end bisulfite sequencing • 2.7k views

ADD COMMENT • link updated 5.3 years ago by Devon Ryan 104k • written 5.3 years ago by Floyd • 0

0

Entering edit mode

Hi Floyd, I'm currently facing exactly the same problem with my data (fastqc reports look exactly the same as yours). Could you share your experience with the programs you have tried so far to clean your data? Thanks very much in advance.

ADD REPLY • link 4.7 years ago by szutre ▴ 10

score 2 · Accepted Answer · 2019-08-07

2

Entering edit mode

5.3 years ago

Devon Ryan 104k

It's unclear how much of that is conversion bias and how much is just a glitch in the sequencer (my guess is that the latter is the primary cause of this). Regardless, use an aligner like bwa-meth that can soft-clip the alignments and then look for any methylation bias (e.g., with MethylDackel) to just avoid the first 10-15 bases of read 2 during methylation extraction.

ADD COMMENT • link 5.3 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you so much for your answer! I will use those programs to have a try. I will also test omitting the first 15 bases.

ADD REPLY • link 5.3 years ago by Floyd • 0

1

Entering edit mode

Hi Floyd, I'm currently facing exactly the same problem with my data (fastqc reports look exactly the same as yours). Could you share your experience with the programs you have tried so far to clean your data? Thanks very much in advance.

ADD REPLY • link 4.7 years ago by szutre ▴ 10