Question

Error executing bedops convert2bed

0

Entering edit mode

7.9 years ago

lakhujanivijay 5.9k

Objective: to convert vcf to bed

Tool used: bedops

Command:

/home/bedops/bin/convert2bed -i vcf < my.vcf > my.bed

Error encountered

Error: Could not allocate space for VCF INFO string

QUESTIONS:

A. What is the error source? Space/memory crunch? A quick google search of the error message pointed to source code:

char *info_str = NULL;
info_str = malloc(C2B_MAX_LONGER_LINE_LENGTH_VALUE);
if (!info_str) {
    fprintf(stderr, "Error: Could not allocate space for VCF INFO string\n");
    exit(ENOMEM); /* Not enough space (POSIX.1) */        
}

B. What is the difference between vcf2bed and convert2bed. Which will solve my purpose?

bedops bed vcf convert2bed • 4.7k views

ADD COMMENT • link 7.9 years ago by lakhujanivijay 5.9k

1

Entering edit mode

The vcf2bed script is a convenient wrapper around convert2bed. You could run vcf2bed < in.vcf > out.bed, for instance.

It looks like the host computer ran out of memory to allocate to convert2bed at this step. It might be a bug, but I'd need to see the VCF file to be sure, so if you can post your VCF file somewhere that I can look at it, that would be helpful.

I'm about to release v2.4.21 with some new features and fixes, so this is a good time for me to look at this stuff. Let me know.

ADD REPLY • link 7.9 years ago by Alex Reynolds 36k

0

Entering edit mode

Thank you the information on the difference of both. By any means, can I allocate more memory OR assign more number of threads? Due to confidential nature of the data, I will not be able to share the same. What I can tell you at this point is that the file has been created using GATK (HaplotypeCaller) and that the vcf file version is 4.2.

ADD REPLY • link 7.9 years ago by lakhujanivijay 5.9k

0

Entering edit mode

can you try it like this cat your.vcf | /home/bedops/bin/convert2bed -i vcf - > my.bed regarding memory you can use -m but as the help says it is only for sorting the bed output.

--max-mem=<value> (-m <val>) Sets aside <value> memory for sorting BED output. For example, <value> can be 8G, 8000M or 8000000000 to specify 8 GB of memory (default is 2G)

ADD REPLY • link 7.9 years ago by Medhat 9.8k

0

Entering edit mode

It's not a question of allocating more memory through a setting. The error is that the computer you ran this on had no more memory left to allocate to the conversion program. Were you running another program at the same time, which would use a lot of memory?

ADD REPLY • link 7.9 years ago by Alex Reynolds 36k

0

Entering edit mode

Even I have been having the same issue including this - Error: Could not allocate space for VCF FORMAT string I am running the command in my uni server and it gives me this same error!

ADD REPLY • link 7.9 years ago by Shab86 ▴ 310

0

Entering edit mode

If you are able to share details about your setup and input, that would be helpful. Thanks for the feedback.

ADD REPLY • link 7.9 years ago by Alex Reynolds 36k

0

Entering edit mode

Its a linux server setup at our uni which I use using an ssh connection. I use a mac with 16GB ram with an i5 processor. Input is a vcf file which has undergone filtering steps in plink (maf etc), so its in 4.2 format as its in the header info. If I remove the header completely (i.e., all the # lines) then I get this error - Could not allocate space for VCF SAMPLE string.

The first few lines of the file -

##fileformat=VCFv4.2
##source=PLINKv1.90
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  119_215_ABC_1014      137_ABC_1052  137_ABC_1059  138_ABC_1210
1       16226   1:16226:AG:A    AG      A       .       .       .       GT      0/0     0/0     0/0     0/0     0/0     0/0     0/0     0/0

ADD REPLY • link 7.9 years ago by Shab86 ▴ 310

0

Entering edit mode

"If I remove the header completely (i.e., all the # lines) then I get this error"

Why are you removing the header before conversion?

ADD REPLY • link 7.9 years ago by Alex Reynolds 36k

0

Entering edit mode

I just wanted to see what was wrong in my INFO or FORMAT string. But the normal VCF file with headers still had the same errors popping up while using BEDOPS. So, it would be great if you can tell me what's causing this problem?

ADD REPLY • link 7.9 years ago by Shab86 ▴ 310

0

Entering edit mode

If you can post your VCF file somewhere I can download it, and tell me more about your setup (version of operating system, BEDOPS, etc.) I can try to reproduce the problem and troubleshoot further. At the moment, I don't have enough data to confirm or repeat the problem, which I will need in order to fix it. Thanks for the feedback.

ADD REPLY • link 7.9 years ago by Alex Reynolds 36k

0

Entering edit mode

It's not a question of allocating more memory through a setting. The error is that the computer you ran this on had no more memory left to allocate to the conversion program. Were you running another program at the same time, which would use a lot of memory?

I'm getting the same error and I can confirm that my system is not running out of memory. Here are some details:

System:

CPU(s)~2 Hexa core Intel Xeon E5-2630
256GB RAM
~1TB SSD operations and >20TB storage
Debain Linux (Kernel~3.16-2-amd64 x86_64)

Software - Running bedops version: 2.4.20

I've had a batched job running and when I went to inspect the results I noticed that I was missing data for chromosomes 1,2 and 5, all others had completed successfully. I went to the logs and saw the same error reported above, Error: Could not allocate space for VCF INFO string.

I've explored several options including the above sorting memory: cat non_coding_hg38_chr1.vcf | convert2bed -m 32G -i vcf - > non_coding_hg38_chr1.bed. This produces the same results and watching the resources it's not looking like a maximum memory issue - I had 20GB already allocated and watched it ramp to 34GB whilst running, so 14GB total before crash (leaving 222GB of unallocated RAM).

These aren't in my experience 'large' VCF files, with chr1 having 1172838 variants and being 3GB total size.

Any other information I am happy to provide, although this is sensitive data so providing the actual vcf files might be troublesome.

EDIT: first post on Biostars, markdown formatting edits

EDIT2: some additional testing confirms that this is linked to VCF file size. I took the first 1M lines from the chr1 VCF file mentioned above and ran through the conversion with no error.

EDIT3: the limit is somewhere between 1031500 (works) and 1032000 (error) lines for me. Also created a 'dummy' vcf file with this many lines to check, same results.

ADD REPLY • link 7.8 years ago by miles ▴ 40

0

Entering edit mode

I have not replicated this with large (5 GB+) test inputs on this side. However, there is a version v2.4.21 that will be released by tomorrow. Perhaps changes in that version will help users who are having VCF problems.

ADD REPLY • link 7.8 years ago by Alex Reynolds 36k

0

Entering edit mode

Please see: https://github.com/bedops/bedops/releases/tag/v2.4.21 for package downloads and http://bedops.readthedocs.io/en/latest/content/revision-history.html#v2-4-21 for a list of features and fixes.

ADD REPLY • link 7.8 years ago by Alex Reynolds 36k

0

Entering edit mode

This does include modifications to convert2bed that are relevant to VCF parsing, so please let me know if this helps or not. Thanks!

ADD REPLY • link 7.8 years ago by Alex Reynolds 36k

0

Entering edit mode

Great, thanks Alex. I'll give the new version a whirl at some stage later this afternoon and report back.

ADD REPLY • link 7.8 years ago by miles ▴ 40

0

Entering edit mode

All working as intended now Alex, thanks for your help, really appreciated! What ever changes you made to convert2bed in the new version have sorted the issue.