Hey all,
I want to remove duplicates from my bam file.
I use picard MarkDuplicates to remove the duplicates. (REMOVE_DUPLICATES=true)
After I run picard to "remove all duplicates" ,I found in the bam file reads that still flag MarkDuplicates and I found duplicate clusters that are not removed. I thought Picard remove all reads that are flag as Duplicates?
That's why I use samtools rmdup for paired end mode. It remove more reads than picard. But why ?
I thought when I use picard I remove all duplicates (optical and pcr)
I'm confused
post exact commands and
samtools flagstat
output before and after removing duplicatesI post my problem below :)
Try
clumpify.sh
from BBMap suite instead (Introducing Clumpify: Create 30% Smaller, Faster Gzipped Fastq Files. And remove duplicates. ).I check this out. In the next time I post the picard problem that not really remove all duplicates.
I post it in the next time.
Finally, I want unique reads with unique coordinates
Also post examples of remnant duplicates.
Define
unique coordinates
further. Only one read covering every base or a read mapped starting at each base position?my unique coordinates: only the start position should be uniques. If there a program to get these reads for bam files ? I know I lost information about paired end reads but this is not important for me in the next step.
I think a very similar question was recently asked here. Let me see if I can find that thread.
tank you ! Later I post the picard results that don't remove duplicate reads
thanks ! I used awk to get unique start positions : )