I'm doing BS-seq with some ChIP DNA. To get 500M reads from <1ng ChIP DNA, you can imagine the duplication level is HUGE. FastQC reported the duplication rate to be 39% and 66% for my two libraries. In my case, I think the proper way of de-duplication is to set a cutoff value, say 5, to tolerate some PCR duplication (and possibly amplification from distinct DNA fragments with identical ends). How to do this in a customized way? The reads are paired-end. It would be better to start from an alignment file like BAM/SAM.