I have some reads that contain a molecular index so I can know whether they are PCR duplicates. I am going to use the 0x400 flag as specified in the SAM spec to mark them as optical/PCR duplicates.
Should I mark all of the reads in a group (having the same POS and molecular tag) with that flag or should I leave the one with the highest quality (or by whatever metric) unmarked?
I will be sending the result to the GATK SNP-calling pipeline.
I'm not sure I understand:are your SAM records already marked with the flag 0x4 ? or are you looking for a method to set the flag according to the chrom/pos/your-index ?
No, they are mapped reads. I want to set the flag 0x400 (1024) to show that they are PCR duplicates.
Hi Brent,
Did you ever get this working?
Is the code available for download somewhere? I need it but don't want to re-invent the wheel.
Thank you
Hi Brent,
I know this is old, but in case anyone else needs this, I have code here to add the UMI to the bam file by reading information from an original FASTQ: https://github.com/mbusby/AddUMIsToBam in the RX and QX fields.
The Picard MarkDuplicates, I hear from that team but did not test myself yet, will handle this in its duplicate marking.