bedtools window - killed: 9 error
1
0
Entering edit mode
7.0 years ago
spiral01 ▴ 110

I am trying to use the bedtools window command to obtain counts of the number of variants in each window of the hg19 human vcf files. Here is the command:

bedtools window -a 50kb.bed -b chr1.vcf.gz -c > coverage.txt

This results in the following error:

Killed: 9

However, the command works fine on some of the smaller chromosomes (e.g. chr19) without the error occuring. What is causing this error and how can I stop it from happening?

SNP • 6.4k views
ADD COMMENT
0
Entering edit mode

Issue related to RAM or, more likely, available disk space. Instead of crashing your operating system, the shell kills off the process with signal 9.

On which OS are you running this? If linux/UNIX, is it being run on a shared system?

ADD REPLY
0
Entering edit mode

I am running this through the Linux terminal on a Mac. Is this an issue with bedtools then? I have worked on these same large files with other tools (bcftools, vcftools etc) with no issues. Does bedtools unzip the file before working on it?

ADD REPLY
0
Entering edit mode

Yes I assume that it unpacks it into RAM and then performs the operation. As your chr1 is 30GB unpacked, though, you will require > 30GB RAM. It may actually work if you unpack it to the hard-disk first, and then re-run the bedtools command.

There are probably other fancy ways of doing this to avoid excessive memory usage.

ADD REPLY
2
Entering edit mode
7.0 years ago

Another option:

$ gunzip -c chr1.vcf.gz | vcf2bed --sort-tmpdir="/some/large/dir" > chr1.bed
$ bedmap --echo --count --delim '\t' 50kb.bed chr1.bed > answer.bed
$ rm chr1.bed

Or to avoid creating an intermediate file:

$ bedmap --echo --count --delim '\t' 50kb.bed <( gunzip -c chr1.vcf.gz | vcf2bed --sort-tmpdir="/some/large/dir" ) > answer.bed

The directory /some/large/dir should be large enough to store chr1.vcf.

ADD COMMENT
0
Entering edit mode

Hi Alex, I did try this but feeding such large unzipped vcf files to memory isn't feasible and leads to system crashing (the vcf file is 1.2gb zipped but >30gb unzipped).

ADD REPLY
1
Entering edit mode

If you do the second approach that uses standard Unix streams, then BEDOPS only uses ~2 GB of RAM, not 30 GB as other approaches may require. You can adjust this memory usage downwards, which is due to sorting, via --max-mem=<value> in convert2bed or vcf2bed, if you have less than 4GB of RAM. Final disk usage should be minimal, not much more than 50kb.bed in answer.bed. Intermediate disk usage for temporary files created for sorting may be about 30 GB. Hope this helps.

ADD REPLY
0
Entering edit mode

Thanks, this works! Would you be able to walk me through the command? I have never used brackets as you have done before here:

<( gunzip -c chr1.vcf.gz | vcf2bed --sort-tmpdir="/some/large/dir" ) >

Is that saying unzip the file and pipe it to the tmpdir,, before piping it to the main bedmap command?

ADD REPLY
1
Entering edit mode

It is notation in bash called a process substitution: http://tldp.org/LDP/abs/html/process-sub.html

The process substitution I show above extracts the VCF and converts it to BED format, which is written to standard output that the larger bedmap command consumes as one of its inputs.

Another way to think about this is that this creates a temporary, transitory file that can be used where a filename normally gets specified. This file exists as long as it is needed for bedmap to do its work and it only sends or streams a small amount of data at a time to bedmap, which reduces the memory overhead considerably.

ADD REPLY
0
Entering edit mode

Thank you, that's an excellent explanation.

ADD REPLY

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6