I'm trying to use vcf2bed to convert a ~0.5TB .vcf file to .bed and am unable to figure out why this command isn't working. I get a blank file as the output. The job completes in a second and gives me nothing but a blank file. Please let me know what might be the problem here.
vcf2bed < /path/input.vcf > /path/output.bed
Also, it would be nice if anybody could give me an estimate on how large the output .bed file will be when working with a 0.5TB input.
To be sure the format of the input is correct, Here's part of the file:
My input command and .vcf appeaer to be consistent with instructions in the notebook and several forum posts, so I don't know how to get past this. Why is the .bed output 0 bytes?
It is likely that your /tmp folder is filling up with intermediate data during the sorting step. Some /tmp or swap folders are not large enough to hold intermediate results.
Use --sort-tmpdir <dir> with vcf2bed to specify an alternative directory <dir> that can contain more than 500 GB of data (a worst-case scenario, where all variants are on one chromosome).
Alternatively, use --do-not-sort with vcf2bed to keep the result unsorted, and then sort afterwards with sort-bed --tmpdir <dir>, which accomplishes the same result.
If the BED file is too large, you can use vcf2starch to create a Starch archive from the BED file. This will be about twice as efficient as compression with gzip. The BEDOPS documentation describes Starch files and the format in more detail. BEDOPS tools work natively with Starch as well as BED.
Wild guess: wrong path to input file. What's your exact command line?
Did this solution work for you because I am having the same issue and this solution did not fix it. Any help would be greatly appreciated.
I didn't figure out the cause of the problem, so I used PLINK to convert the VCF to bed instead of using vcf2bed. It's very easy with PLINK. https://bioinformatics.stackexchange.com/questions/3667/converting-vcf-file-to-plink-bed-bim-fam-files