PCR duplicates in RRBS data
1
1
Entering edit mode
4.6 years ago
linelr ▴ 40

Hi!

I am working with DNA-methylation in salmon and have recently aquired data from an RRBS experiment. Fastqc reports that my reads consist of around 40% PCR duplicates, which is quite high. However, I have read that I should not remove duplicates e.g. by simply removing reads that have the exact same start and stop position in the genome when working with RRBS data, but this did not come with a proper explanation. This sort of makes sense to me because of the way the library prep is performed: MspI cleaves only CCGGs + size selection of fragments --> you will probably end up with fragments that are pretty similar, and they might therefor be termed PCR duplicates of each other by fastqc. This is of course based on my non-exhaustive understandig of these processes.

I can´t seem to find any good explanations on how to perform a proper PCR duplicate removal for RRBS data, if that is indeed called for (which I suspect it is).

Does anyone know how to do this or can anyone point me to where I might find this information?

Thanks in advance!

Best, Line

RRBS sequencing • 2.2k views
ADD COMMENT
0
Entering edit mode

Allright! This makes sense. Thanks a lot! I´ll keep what you write about FastQC in mind for next time.

Have a good day!

ADD REPLY
0
Entering edit mode

Please reply to comments with Add comment and Add reply, that keeps the thread organized. Thanks you. Also please feel free to upvote and accept good answers.

enter image description here

ADD REPLY
1
Entering edit mode

Sure! Thanks for the reminder

ADD REPLY
3
Entering edit mode
4.6 years ago

I strongly recommend that you not remove alleged PCR duplicates in RRBS data processing. In data like this we expect that there should appear to be very high levels of what look like PCR duplicates. These are not real PCR duplicates (for the most part at least). Please note that FastQC's defaults are all intended for whole-genome sequencing and will give warnings that you should ignore if you run it on RRBS datasets.

ADD COMMENT
0
Entering edit mode

Allright! This makes sense. Thanks a lot! I´ll keep what you write about FastQC in mind for next time.

Have a good day!

ADD REPLY

Login before adding your answer.

Traffic: 1234 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6