samtools rmdup error
1
0
Entering edit mode
7.0 years ago
ttsutsui1028 ▴ 20

I am trying to remove PCR duplicate from my sam file. When I use samtools rmdup, it will abort in the middle.

samtools rmdup my_data.sam my_data_rmdup.sam
[bam_rmdup_core] processing reference chr6...
....
samtools(5713,0x7fffa2def340) malloc: *** error for object 0x7f999f600508: incorrect checksum for freed object - object was probably modified after being freed. *** set a breakpoint in malloc_error_break to debug Abort trap: 6

I can't solve this issue. My condition is following:

ProductName:    Mac OS X
ProductVersion: 10.13.2
32G memory
samtools Version: 1.6 (using htslib 1.6)

I would be happy if someone gave me an advice.

Thank you

samtools WGBS • 5.1k views
ADD COMMENT
2
Entering edit mode

try again after converting the file into bam.

ADD REPLY
0
Entering edit mode
7.0 years ago
h.mon 35k

samtools rmdup should not be used:

samtools rmdup [-sS] <input.srt.bam> <out.bam>

This command is obsolete. Use markdup instead.

In addition, when using samtools markdup, you have to sort the .bam (not .sam) by coordinate and run samtools fixmate.

samtools markdup [-l length] [-r] [-s] in.algsort.bam out.bam

Mark duplicate alignments from a coordinate sorted file that has been run through fixmate with the -m option. This program relies on the MC and ms tags that fixmate provides.

ADD COMMENT
0
Entering edit mode

It is perfectly fine to use rmdup, especially on larger datasets where time and IO are a limiting factor. Rmdup is not perfect, but the ultimate differences, at least in WGS, to MarkDuplikates from Picard are small, source here.

ADD REPLY
0
Entering edit mode

Once we are at it... Recently I've been using bamsort in biobambam to sort and mark or remove duplicates. The nice thing is that you can stream alignment from bwa to bamsort and get sorted, marked, and indexed bam file almost for free given that it all works in a stream. bamsort also gives duplicate metrics similar to picard/MarkDuplicates. E.g.:

bwa mem ref.fa R1.fq.gz R2.fq.gz \
| bamsort inputformat=sam markduplicates=1 rmdup=0 fixmates=1 inputthreads=8 outputthreads=8 \
M=aln.dupmetrics.txt O=aln.bam index=1 indexfilename=aln.bam.bai

(Of course, the aligner doesn't have to be bwa as long the input to bamsort is sam or bam collated by read name.)

ADD REPLY
0
Entering edit mode

Cool, thanks for mentioning that tool. Will check it out. I always found it a pain that Picard MarkDuplicates and SamBamba markdup are not pipe-able, especially for cohorts of WGS.

ADD REPLY
0
Entering edit mode

Nonetheless, rmdup works on a sorted bam, not on a sam file.

ADD REPLY
0
Entering edit mode

Agreed. Except, samtools fixmate needs name-sorted file. Coordinate-sorted file not accepted.

ADD REPLY

Login before adding your answer.

Traffic: 1696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6