I encountered an error "Value was put into Pairinfo Map more than once" while running picard's MarkDuplicates on a bam file that was created by merging files from three different runs.
I found several threads on Biostars addressing this issue, e.g.
Markduplicates: Value Was Put Into Pairinfomap More Than Once
however, I'm not sure how and where to best implement the suggested corrections of fixing the SM tag (e.g. one reply suggests prefixing each read id with a tag to indicate lanes, or presumably any other identifier).
Is this something that can be performed on the three bam files that I generated before merging (perhaps with an appropriate samtools function), or do I need to go back to my fastq and write a script to rename each read? Since this seems to be a common problem, I assume that there must be some simple fix?
Could you please direct me to an example which uses addreplacerg across an entire bamfile (i.e. I would need to add a read group tag for each read based on the original run/bamfile that it came from, it's not clear to me from the addreplacerg documentation how to do this).
Additionally, I found another discussion thread on this topic that suggested the use of the samtools flag function - is this a sound approach?
not tested:
Thanks. If I understand correctly, this adds ID1 to the rg header in the first bam, etc. Could you please explain the syntax of the tsM:SAMPLE ?
See: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups
\t
is tab