Upon looking through a bit of code, this seems to be a bug in the most recent (0.1.19) version of samtools. The genesis of this seems to be as follows:
The `out.prefix` parameter used in samtools sort ends up being passed to the bam_sort_core_ext function. There, the output filename is created with sprintf(fnout, "%s%s", prefix, suffix);, unless sending things to stdout. Here, suffix is a pointer to ".bam" and incremented by 4 (i.e., made to point to NULL) when the -f option is used from the command line.
Similarly, the temporary filenames (fns[]) that are to be merged (but not written to!) are generated in the same function using a nearly identical method:
for(i = 0; i < n_files; ++i) {
fns[i] = (char*)calloc(strlen(prefix) + 20, 1);
sprintf(fns[i], "%s.%.4d%s", prefix, i, suffix);
}
The problem is in the worker threads that perform the actual sorting. There, the temporary filenames are created with sprintf(name, "%s.%.4d.bam", w->prefix, w->index);.
You can see, then, that the temp files will always end in XXXX.bam, regardless of the fact that the merge function is being told that they should just end with XXXX. The simplest fix would be to just change how fns is set:
The temp files get deleted anyway and the -f option is still honored this way, but things continue to work. I'll double check that this is correct and file a bug report.
Edit: I've confirmed this and filed a bug report. The code fix is very simple.
Edit2: The earlier formatting issues in this answer seem to have been fixed by directly using html rather than relying on the forum's formatting. It's good to know that it can't deal with code blocks inside ordered lists.
Edit3: Apparently this was fixed 6 months ago in the github repository. Grrr, it would have been nice had they just released a new version with bug fixes. I know they're working on a big overhaul to switch to using htslib, but still. I'll update my bug report and see if I can just help get an intermediate bug-fix release pushed out (I don't know how amenable the developers are to that).
I know it's frustrating to be tripped up by bugs that turn out to be long since fixed. It might have been better if we had been making releases from the 0.1.x branch over the last few months, but with the htslib-based samtools now imminent, we're reluctant to ask people to update to an untested-by-them 0.1.20 and then to immediately update again to a release from the new branch.
Fair enough. This is an easy enough bug to simply avoid anyway :) Good luck getting the htslib version finished. Is the "develop" branch the one being readied for release or is it a different one? I'd be happy to help since I use the samtools code a good bit.
"The error message from BWA is" : how do you know it's an error from bwa ?
actually the error message starts with "[bam_merge_core]", which I did not paste last time, so I am quite sure
That's from samtools then (bam_merge_core is a samtools internal function). This may be a samtools bug. Which version are you using?
Edit: See my answer below. This is definitely a bug.