Hi!
I am working with DNA-methylation in salmon and have recently aquired data from an RRBS experiment. Fastqc reports that my reads consist of around 40% PCR duplicates, which is quite high. However, I have read that I should not remove duplicates e.g. by simply removing reads that have the exact same start and stop position in the genome when working with RRBS data, but this did not come with a proper explanation. This sort of makes sense to me because of the way the library prep is performed: MspI cleaves only CCGGs + size selection of fragments --> you will probably end up with fragments that are pretty similar, and they might therefor be termed PCR duplicates of each other by fastqc. This is of course based on my non-exhaustive understandig of these processes.
I can´t seem to find any good explanations on how to perform a proper PCR duplicate removal for RRBS data, if that is indeed called for (which I suspect it is).
Does anyone know how to do this or can anyone point me to where I might find this information?
Thanks in advance!
Best, Line
Allright! This makes sense. Thanks a lot! I´ll keep what you write about FastQC in mind for next time.
Have a good day!
Please reply to comments with
Add comment
andAdd reply
, that keeps the thread organized. Thanks you. Also please feel free to upvote and accept good answers.Sure! Thanks for the reminder