Question

Best way to deal with overlapping read names in merged BAM files

0

Entering edit mode

7 months ago

shpak.max ▴ 50

I encountered an error "Value was put into Pairinfo Map more than once" while running picard's MarkDuplicates on a bam file that was created by merging files from three different runs.

I found several threads on Biostars addressing this issue, e.g.

Markduplicates: Value Was Put Into Pairinfomap More Than Once

however, I'm not sure how and where to best implement the suggested corrections of fixing the SM tag (e.g. one reply suggests prefixing each read id with a tag to indicate lanes, or presumably any other identifier).

Is this something that can be performed on the three bam files that I generated before merging (perhaps with an appropriate samtools function), or do I need to go back to my fastq and write a script to rename each read? Since this seems to be a common problem, I assume that there must be some simple fix?

picard MarkDuplicates • 721 views

ADD COMMENT • link 7 months ago by shpak.max ▴ 50

score 0 · Answer 1 · 2024-04-15

0

Entering edit mode

7 months ago

Pierre Lindenbaum 164k

I'm not sure how and where to best implement the suggested corrections of fixing the SM tag

use to change the read group of one or more bam samtools addreplacerg.

ADD COMMENT • link 7 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Could you please direct me to an example which uses addreplacerg across an entire bamfile (i.e. I would need to add a read group tag for each read based on the original run/bamfile that it came from, it's not clear to me from the addreplacerg documentation how to do this).

Additionally, I found another discussion thread on this topic that suggested the use of the samtools flag function - is this a sound approach?

samtools view -f 0x2 -b in.bam > out.bam

ADD REPLY • link 7 months ago by shpak.max ▴ 50

1

Entering edit mode

not tested:

samtools addreplacerg -r "@RG\tID:ID1\tSM:SAMPLE1  -O BAM -o new1.bam old1.bam
samtools addreplacerg -r "@RG\tID:ID2\tSM:SAMPLE2  -O BAM -o new2.bam old2.bam

ADD REPLY • link 7 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thanks. If I understand correctly, this adds ID1 to the rg header in the first bam, etc. Could you please explain the syntax of the tsM:SAMPLE ?

ADD REPLY • link 7 months ago by shpak.max ▴ 50

0

Entering edit mode

See: https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

\t is tab

ADD REPLY • link 7 months ago by GenoMax 147k