Question

Converting a VCF with SNPs and indels to BED format

5

Entering edit mode

10.4 years ago

onter ▴ 170

Does anyone know of tools or have custom scripts (aside from "vcf2bed") that are able to convert between a VCF containing both indels and SNPs into BED format? The tricky part is creating the correct BED regions to capture the indel variants, and I haven't been able to find anything on the internet thus far.

next-gen • 42k views

ADD COMMENT • link updated 13 months ago by Axzd ▴ 80 • written 10.4 years ago by onter ▴ 170

3

Entering edit mode

Hello gareth862!

It appears that your post has been cross-posted to another site: SEQanswers.

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 10.4 years ago by Devon Ryan 104k

0

Entering edit mode

What result are you expecting?

ADD REPLY • link 10.4 years ago by Alex Reynolds 36k

1

Entering edit mode

We are trying to get the "affected positions" from the vcf. Our plan was to use bed tools to intersect our bed file with the vcf affected positions to produce a bed file of reference ranges

If the allele matches the ref then it wouldn't write a range for it
If deletion the range would include all deleted positions
If a variation the range would be only the changed bases
1 vcf line can generate multiple bed lines

ADD REPLY • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by carterc ▴ 10

Ram · Answer 1 · 2014-07-11

9

Entering edit mode

10.4 years ago

Alex Reynolds 36k

The vcf2bed script will handle the last three issues:

Use --deletions to convert deletions to one set
Use --snvs to convert single-nucleotide variants to convert to a second set
Conversion of one VCF line will generate multiple BED lines if there are multiple alternate alleles

You can use bedops --everything to union the deletion and single-base variant sets:

$ vcf2bed --deletions < foo.vcf > foo_deletions.bed
$ vcf2bed --snvs < foo.vcf > foo_snvs.bed
$ bedops --everything foo_{deletions,snvs}.bed > foo.bed

Once you have your deletions and SNVs in one file, it's easy to deal with the first issue. The conversion script maps the REF and ALT fields of the VCF input to the sixth and seventh columns of the BED output. So you can filter out lines from the BED output where those columns are equivalent, using a simple awk statement:

$ awk '$6 != $7' foo.bed > foo_filtered.bed

ADD COMMENT • link updated 3.0 years ago by Ram 44k • written 10.4 years ago by Alex Reynolds 36k

0

Entering edit mode

Alex, I get following error with the above command.

Error: Could not allocate space for VCF INFO string

I have updated the bedops to the latest version (2.4.28) as you mentioned here. This update resolves the same error from convert2bed command. But the issue with vcf2bed remains. I would like to use vcf2bed over convert2bed since the deletions and the SNVs can be processed separately using vcf2bed.

Any suggestions?

ADD REPLY • link 7.2 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

vcf2bed is just a wrapper that calls convert2bed. If the issue shows up with vcf2bed and does not happen with convert2bed, I would suggest there's a problem with your installation. I would look in /usr/local/bin and make sure that vcf2bed is pointing to the current convert2bed.

ADD REPLY • link 7.2 years ago by Alex Reynolds 36k

0

Entering edit mode

Oh yes. I am running the latest version locally (through directory). Its probably trying to access the old version which is installed globally.

ADD REPLY • link 7.2 years ago by Satyajeet Khare ★ 1.6k

1

Entering edit mode

The wrapper vcf2bed just uses whatever convert2bed it can find first in your environment PATH. You might try editing your PATH so that your local copy comes up first, or just upgrade the global copy.

Otherwise, you can use all the same functions in convert2bed as in vcf2bed, e.g.:

$ convert2bed --input=vcf --insertions < snps.vcf > insertions.bed

Etc.

You can run convert2bed --help-vcf for a listing of VCF-specific options, which should be exactly those offered by the wrapper script.

ADD REPLY • link 7.2 years ago by Alex Reynolds 36k

0

Entering edit mode

I used the same functions in the convert2bed as you mentioned. It works clean. Thanks!

ADD REPLY • link 7.2 years ago by Satyajeet Khare ★ 1.6k

1

Entering edit mode

Great to hear! Feel free to follow up if you have any more questions or problems.

ADD REPLY • link 7.2 years ago by Alex Reynolds 36k

0

Entering edit mode

Sorry,

I have some .vcf files contain only snp (I guess because I have separate vcf files for somatic indels and structural variants)

I used this command to convert vcf to bed

vcf2bed --snvs < foo.vcf > foo_snvs.bed

My bed does not have any column named strand

How I can extract the stand column from vcf please?

Thanks

ADD REPLY • link 5.8 years ago by zizigolu ★ 4.3k

1

Entering edit mode

I think, but am not 100% positive, that variant alleles are called against the forward strand-oriented reference base. So you could simply assume a "+".

Verifying this may be worth a separate question on biostars, and there are many people here who would likely be able to confirm this — or tell you that I'm wrong, which happens often. I'd double-check what I'm saying here.

If you must have a strand field in a BED file, and you can assume use of "+" (forward-oriented), and you don't mind losing the existing columns 6 and so on from what comes out of vcf2bed, then you could perhaps do something like the following:

$ cut -f1-5 foo_snvs.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, $4, $5, "+"; }' > foo_snvs.bed6

You could do some more work with awk if you need to insert the strand column and preserve the existing columns. I'll leave that as an exercise.

ADD REPLY • link 5.8 years ago by Alex Reynolds 36k

0

Entering edit mode

Sorry

I don't know I got error

[fi1d18@cyan01 ~]$ module load bedtools/2.21.0
[fi1d18@cyan01 ~]$ $ cut -f1-5 foo_snvs.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, $4, $5, "+"; }' > foo_snvs.bed6
bash: $: command not found
[fi1d18@cyan01 ~]$

ADD REPLY • link 5.8 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Please try removing the leading "$" from:

$ cut -f1-5 foo_snvs.bed | awk -v FS="\t" -v OFS="\t" '{ print $1, $2, $3, $4, $5, "+"; }' > foo_snvs.bed6

ADD REPLY • link 5.7 years ago by jeb247 ▴ 10

score 3 · Answer 2 · 2023-04-06

3

Entering edit mode

20 months ago

Pierre Lindenbaum 164k

bcftools query -f '%CHROM\t%POS0\t%END\n' in.vcf

ADD COMMENT • link 20 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Hello, isn't bed 0-based for the start and 1-based for the end, while vcf POS is 1-based? so shouldn't we pipe this into awk to do a "minus 1"? What am I missing here?

EDIT: I misread the oneliner, this comment is obsolete.

ADD REPLY • link 13 months ago by Axzd ▴ 80

1

Entering edit mode

I wrote %POS0, not %POS

ADD REPLY • link 13 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

My fault, you are of course right.

ADD REPLY • link 13 months ago by Axzd ▴ 80

score 0 · Answer 3 · 2020-06-30

0

Entering edit mode

4.4 years ago

svp ▴ 680

Try the following code:

awk '! /\#/' input.vcf | awk '{if(length($4) > length($5)) print $1"\t"($2-1)"\t"($2+length($4)-1); else print $1"\t"($2-1)"\t"($2+length($5)-1)}' > output.bed

This should work!

ADD COMMENT • link 4.4 years ago by svp ▴ 680

1

Entering edit mode

Offering a slight improvement on svp's answer:

bcftools norm -m- input.vcf | grep -v '^#' | awk -v OFS='\t' '{if(length($4) > length($5)) print $1,$2,$2+length($4)-1; else print $1,$2-1,$2+length($5)-1}' | bedtools merge -i - > output.bed

A few key points:

breaking multi-allelic variants so that length($4) and length($5) are correct
for deletions (length($4) > length($5)), the deleted base actually starts at the vcf coordinate +1 (e.g. this is a 1bp deletion at position 101: chr1 100 AT A) so the correct bed 0-based coordinate is $2,$2+length($4)-1 as opposed to $2-1,$2+length($4)-1
we can also merge overlapping intervals using bedtools merge

ADD REPLY • link 3.4 years ago by skanterakis ▴ 130

score 0 · Answer 4 · 2023-04-06

You can use UCSC vcfToBed utility tool.

vcfToBed foo.vcf foo

$ vcfToBed

vcfToBed - Convert VCF to BED9+ with optional extra fields.

usage: vcfToBed in.vcf outPrefix

options: -fields=comma-sep list of tags to include in the bed file, other fields will be placed into out.extraFields.tab

NOTE: Extra VCF tags (that aren't listed in -fields) get placed into a separate tab file for later indexing.