I received some bams where duplicates were marked using Picard. I'd like to unmark all duplicate reads for a study we're doing. Does anyone know of an existing tool? I haven't found anything in google land.
If there isn't an existing tool, could you provide some guidance how to unmark them myself? I've read the SAM spec and understand I need to change the bit flag, but I'm not sure the best way to approach it.
Thank you for the reminder. My colleague and I were both looking for a solution and it looks like he also posted a question about it. Please excuse the duplicate post.
Edit: Thank you everyone for your helpful responses!
I wouldn't go that far. OS X is essentially a BSD variant with a really nice user interface. The and() function is just a difference between gawk and mawk and one can always install the other one.
The section 6.1 of this tutorial from the GATK team uses a modified version of your code to remove the 0x1 flag, but i get different types of syntax errors. When I instead try to use your exact code except for the flag code, I get:
gawk: cmd. line:1: (FILENAME=- FNR=1) fatal: not enough arguments to satisfy format string
`%s @HD'
^ ran out for this one
[main_samview] fail to read the header from "-".
I'm not familiar with any existing tool (I wouldn't be surprised if Pierre Lindenbaum has written such a tool), but this should be simple enough to code. Depending on the version of awk you have on you system, you may simply be able to use it. In some implementations of awk (not all of them), the command if(and($2,1024)){$2-=1024} will unmark a duplicate. You would simply samtools view -h foo.bam | awk ...stuff... | samtools view -Sb - > foo.unmarked.bam.
The other simple way to do this is with python using pysam. Pysam allows reading from and writing to SAM/BAM files from within python, so you could perform a similar operation there.
Hello me.mark!
It appears that your post has been cross-posted to another site: SEQanswers
This is typically not recommended as it runs the risk of annoying people in both communities.
Thank you for the reminder. My colleague and I were both looking for a solution and it looks like he also posted a question about it. Please excuse the duplicate post.
Edit: Thank you everyone for your helpful responses!
That explains the very different usernames, no worries!
This post explains how to do this using Picard: Remove flags of MarkDuplicates (picard)