Hi,
Is it possible to remove the MarkDuplicates flags (not the sequences) from a BAM file? If so, how?
Hi,
Is it possible to remove the MarkDuplicates flags (not the sequences) from a BAM file? If so, how?
You can use Picard RevertSam for this. This tool can be used to reset various attributes of a BAM file including duplicate information. Simply use: REMOVE_DUPLICATE_INFORMATION=true
Example command:
java -Xmx7g -jar ~/tools/picard/picard-tools-1.118/RevertSam.jar OUTPUT=UnmarkedDuplicates.bam INPUT=MarkedDuplicates.bam REMOVE_DUPLICATE_INFORMATION=true
Depending on the version of awk you have on your computer then something like the following should work:
samtools view -h foo.bam | awk 'BEGIN{OFS="\t"}{if(NF>5) {if(and($2,1024)) {$2-=1024}} print $0}' | samtools view -Sbo foo.unmarked.bam -
I think Macs have mawk rather than gawk, so this doesn't work there.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
duplicate of Tool to unmark duplicates