Entering edit mode
8.0 years ago
lakhujanivijay
5.9k
Objective: to convert vcf to bed
Tool used: bedops
Command:
/home/bedops/bin/convert2bed -i vcf < my.vcf > my.bed
Error encountered
Error: Could not allocate space for VCF INFO string
QUESTIONS:
A. What is the error source? Space/memory crunch? A quick google search of the error message pointed to source code:
char *info_str = NULL;
info_str = malloc(C2B_MAX_LONGER_LINE_LENGTH_VALUE);
if (!info_str) {
fprintf(stderr, "Error: Could not allocate space for VCF INFO string\n");
exit(ENOMEM); /* Not enough space (POSIX.1) */
}
B. What is the difference between vcf2bed
and convert2bed
. Which will solve my purpose?
The
vcf2bed
script is a convenient wrapper aroundconvert2bed
. You could runvcf2bed < in.vcf > out.bed
, for instance.It looks like the host computer ran out of memory to allocate to
convert2bed
at this step. It might be a bug, but I'd need to see the VCF file to be sure, so if you can post your VCF file somewhere that I can look at it, that would be helpful.I'm about to release v2.4.21 with some new features and fixes, so this is a good time for me to look at this stuff. Let me know.
Thank you the information on the difference of both. By any means, can I allocate more memory OR assign more number of threads? Due to confidential nature of the data, I will not be able to share the same. What I can tell you at this point is that the file has been created using GATK (HaplotypeCaller) and that the vcf file version is 4.2.
can you try it like this
cat your.vcf | /home/bedops/bin/convert2bed -i vcf - > my.bed
regarding memory you can use -m but as the help says it is only for sorting the bed output.It's not a question of allocating more memory through a setting. The error is that the computer you ran this on had no more memory left to allocate to the conversion program. Were you running another program at the same time, which would use a lot of memory?
Even I have been having the same issue including this - Error: Could not allocate space for VCF FORMAT string I am running the command in my uni server and it gives me this same error!
If you are able to share details about your setup and input, that would be helpful. Thanks for the feedback.
Its a linux server setup at our uni which I use using an ssh connection. I use a mac with 16GB ram with an i5 processor. Input is a vcf file which has undergone filtering steps in plink (maf etc), so its in 4.2 format as its in the header info. If I remove the header completely (i.e., all the # lines) then I get this error - Could not allocate space for VCF SAMPLE string.
The first few lines of the file -
"If I remove the header completely (i.e., all the # lines) then I get this error"
Why are you removing the header before conversion?
I just wanted to see what was wrong in my INFO or FORMAT string. But the normal VCF file with headers still had the same errors popping up while using BEDOPS. So, it would be great if you can tell me what's causing this problem?
If you can post your VCF file somewhere I can download it, and tell me more about your setup (version of operating system, BEDOPS, etc.) I can try to reproduce the problem and troubleshoot further. At the moment, I don't have enough data to confirm or repeat the problem, which I will need in order to fix it. Thanks for the feedback.
I'm getting the same error and I can confirm that my system is not running out of memory. Here are some details:
System:
Software - Running bedops version: 2.4.20
I've had a batched job running and when I went to inspect the results I noticed that I was missing data for chromosomes 1,2 and 5, all others had completed successfully. I went to the logs and saw the same error reported above,
Error: Could not allocate space for VCF INFO string
.I've explored several options including the above sorting memory:
cat non_coding_hg38_chr1.vcf | convert2bed -m 32G -i vcf - > non_coding_hg38_chr1.bed
. This produces the same results and watching the resources it's not looking like a maximum memory issue - I had 20GB already allocated and watched it ramp to 34GB whilst running, so 14GB total before crash (leaving 222GB of unallocated RAM).These aren't in my experience 'large' VCF files, with chr1 having 1172838 variants and being 3GB total size.
Any other information I am happy to provide, although this is sensitive data so providing the actual vcf files might be troublesome.
EDIT: first post on Biostars, markdown formatting edits
EDIT2: some additional testing confirms that this is linked to VCF file size. I took the first 1M lines from the chr1 VCF file mentioned above and ran through the conversion with no error.
EDIT3: the limit is somewhere between 1031500 (works) and 1032000 (error) lines for me. Also created a 'dummy' vcf file with this many lines to check, same results.
I have not replicated this with large (5 GB+) test inputs on this side. However, there is a version v2.4.21 that will be released by tomorrow. Perhaps changes in that version will help users who are having VCF problems.
Please see: https://github.com/bedops/bedops/releases/tag/v2.4.21 for package downloads and http://bedops.readthedocs.io/en/latest/content/revision-history.html#v2-4-21 for a list of features and fixes.
This does include modifications to
convert2bed
that are relevant to VCF parsing, so please let me know if this helps or not. Thanks!Great, thanks Alex. I'll give the new version a whirl at some stage later this afternoon and report back.
All working as intended now Alex, thanks for your help, really appreciated! What ever changes you made to
convert2bed
in the new version have sorted the issue.Great, thanks for the feedback!