Question

FastQC report about WGBS.

0

Entering edit mode

8.7 years ago

hxlei613 ▴ 100

Hi~ I'm working on some WGBS data now.

After quality and adapter trimming, Sequence Duplication Levels and Per sequence GC content still cannot pass . In Per sequence GC content, the read peak is higher than theoretical distibution. Is this ok ?

Thank you very much if you can provide some help !

Here are some pictures from FastQC after trimming.

Per base sequence content There is a small fluctuation at the first few bases.Should I trim it ? At the end, the sharp decrease of A at the last position is a result of removing the adapter sequence very stringently, i.e. even a single trailing A at the end is removed.

Per Sequence GC content

Sequence duplication levels Should I deduplicate sequence during quality control ( before mapping ) or filtering reads after alignments using deduplicate_bismark ?

WGBS Trimming • 4.0k views

ADD COMMENT • link updated 6.6 years ago by Biostar 20 • written 8.7 years ago by hxlei613 ▴ 100

score 0 · Answer 1 · 2016-08-18

0

Entering edit mode

8.7 years ago

igor 13k

Don't worry too much about FastQC reports. They are very conservative. It's nearly impossible to have them all pass. You should manually look at the report and see if each section makes sense.

See previous discussion here about FastQC: FastQ quality check : what can we correct ?

And you can use deduplicate_bismark to remove duplicates, which is convenient if you are using Bismark for everything else.

ADD COMMENT • link 8.7 years ago by igor 13k

0

Entering edit mode

Thank you very much ! Do you mean duplicates can be kept before mapping ?

ADD REPLY • link 8.7 years ago by hxlei613 ▴ 100

0

Entering edit mode

Correct, there's no reason to bother deduplicating before alignment.

ADD REPLY • link 8.7 years ago by Devon Ryan 105k

0

Entering edit mode

Yes. Usually you identify duplicates if two reads (or read pairs) align to the same exact spot in the genome.

ADD REPLY • link 8.7 years ago by igor 13k

0

Entering edit mode

For pair-end alignments, does bismark consider a duplicate if both partner reads start and end at the exact same position ? Or if only one of the partner reads ?

ADD REPLY • link 8.7 years ago by hxlei613 ▴ 100

0

Entering edit mode

Oh, I figure it out. A duplicate is which both partner reads start and end at the exact same position. Thank you very much.

ADD REPLY • link 8.7 years ago by hxlei613 ▴ 100