What versions of convert2bed
, sort-bed
and samtools
are you using?
Behind the scenes, convert2bed
(bam2bed
) passes BED to sort-bed
. This sorting tool creates properly sorted BED.
One possibility is that your /tmp
folder might be on a disk mount/share smaller than what is needed to store intermediate uncompressed BED data. Because BAM files being processed are often larger than system memory, sort-bed
uses 2 GB of system memory, by default. Then sort-bed
uses your /tmp
folder (or whatever your operating system's temporary folder is set to) to store intermediate data for a merge sort.
If running out of memory in /tmp
while sorting is the issue, there are a few things you can do by setting appropriate convert2bed
/bam2bed
options.
You could set --max-mem
to 40G
or similar, since you have a system with 384 GB of RAM. Then all the sorting work on uncompressed BED would be done in memory, which will be faster, and you wouldn't need to worry as much about using or running out of space in /tmp
.
Or, you could set the temporary sort directory via --sort-tmpdir
to a folder on a disk share that has at least 40 GB of free space.
Or, you could disable BED sorting altogether via --do-not-sort
. I really don't recommend this, since the sort order of BAM files can be either unknown or can be non-lexicographical, and the resulting sort order of the BED file will then be unknown or incorrect, possibly making it unusable for set operations with bedops, bedmap, etc.
I would only suggest using --do-not-sort
if you pipe to awk
or cut
to remove columns, e.g.:
$ bam2bed --do-not-sort < reads.bam | cut -f1-6 | sort-bed --max-mem 20G - > reads.sorted.bed
We try to be non-lossy about conversion between formats. You may only be interested in a subset of columns, however, so this is a fast way to discard columns you might not want or need, with the benefit that sort-bed
has a lot less input to sort within memory.
If your BAM file is indexed, a different option is to convert BAM to BED via one of several parallelized options, such as via GNU Parallel or via a SLURM or SGE computational cluster. This splits up the conversion work by chromosome, and those individual, per-chromosome conversion tasks are going to be much smaller. Conversion will go much faster, too, since we use some tricks that basically reduce the overall job to the time taken to convert the largest chromosome (i.e. chr1
tends to be the largest).
In any case, there might be a bug with the BAM conversion routines, but I really couldn't possibly begin to troubleshoot without knowing versions of binaries and having some sample input to reproduce the error. So take a look at the options above and see if adjusting memory settings may help, or if parallelization is an option for you, then I definitely recommend that route, if your time is valuable to you and you have those computational resources.
Error message and some more information on how you mapped the file will be helpful. Generally the BAM file is truncated or messed up in some way would be my guess.
Thanks for the response.
As I mentioned, there is no error message. convert2bed appears to run successfully but only generates an empty bed file.
Mapping of paired-end reads onto hg19 was performed with bwa-mem. Duplicates were removed with picard MarkDuplicates. No errors were generated during these steps. The resulting bam file (after de-duplication and sorting) can be used for peak calling (macs2), suggesting the bam file is okay in general.
The aggregate size of files in the temp folder is around 35G by the way (the input BAM file is around 20 GB, and the resulting bed file is around 100 GB). I thought I'd post it in case anyone was interested.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts (or edit original question with new relevant information) to keep threads logically organized.A worst-case scenario may require sorting an uncompressed file that is around 100 GB in size. (Imagine a BAM file where every read is randomly positioned.) So you might need a temporary folder that might hold up to that much data when extracting, converting, and sorting that large of a BAM file. Converting to BED in parallel, one chromosome at a time, is a good way to go with large BAM files, if they are indexed or can be indexed.