Question

Update 'ID' column in VCF file using BCFtools annotate

5

Entering edit mode

7.9 years ago

mrxcm3 ▴ 80

I have a very large VCF file where the 'ID' column is a unique ID comprising of 'chr:bp'. I would like to update the 'ID' column to dbSNP IDs.

I have downloaded a bed file [chr, from, to, rsid], which I have sorted and tabix indexed. The bedfile is for hg19, which is correct for my data, chromosomes are formatted with 'chr' and their is no header.

It seems that the BCFtools annotate function does allow 'ID' column to be updated, but I am not clear how. I have tried;

i) bcftools annotate -a dbsnp.bed.gz -c 'CHROM,POS,-,ID' my.vcf.gz

ii) bcftools annotate -a dbsnp.bed.gz -c 'CHROM,FROM,TO,ID' my.vcf.gz

neither of which updated the ID column. I also tried removing the 'ID' first from the VCF -R , and piped the vcf into the two commands above. Perhaps this is not the right tool? Any advice appreciated.

bcftools vcftools • 21k views

ADD COMMENT • link updated 3.0 years ago by from the mountains ▴ 250 • written 7.9 years ago by mrxcm3 ▴ 80

3

Entering edit mode

7.9 years ago

WouterDeCoster 47k

Might be possible with bcftools annotate, but I use snpsift annotate for the same job. It takes a vcf file from dbSNP for annotation.

ADD COMMENT • link 7.9 years ago by WouterDeCoster 47k

score 4 · Accepted Answer · 2016-12-16

4

Entering edit mode

7.9 years ago

William ★ 5.3k

bcftools annotate -c CHROM,FROM,TO,ID -a my_ids.bed.gz   -o output.vcf  input.vcf.gz

works fore me.

It did also take me some time to get it to work and it is hard to debug were the mistake is:

Some things to try / check:

VCF is 1 based, BED is zero based. POS 10 in VCF is start 9 end 10 in BED. https://genome.ucsc.edu/FAQ/FAQformat#format1
Make sure your chromosome names match exactly, Chr1 and chr_1 are not the same for bcftools.
Remove the quotes around the column list 'CHROM,FROM,TO,ID' -> CHROM,FROM,TO,ID
Test with a very small subset of your VCF and BED file that should produce an annotated VCF file. This makes it faster to debug and test different options / formattings until you get it right.

ADD COMMENT • link 7.9 years ago by William ★ 5.3k

0

Entering edit mode

This worked. It was my chromosome names that were the issue. I thought the bed file has to have 'chr1' format in order to be tabix indexed.

ADD REPLY • link 7.9 years ago by mrxcm3 ▴ 80

0

Entering edit mode

@mrxcm3,

How do you fix chromsome names?

ADD REPLY • link 4.3 years ago by lovelymaoqin • 0

1

Entering edit mode

bcftools annotate with the --rename-chrs parameter

ADD REPLY • link 3.0 years ago by from the mountains ▴ 250