Question

Read VEP output from command line in R

0

Entering edit mode

4.1 years ago

yussab ▴ 100

Dear Biostars Community,

I can't find a way to read my annotated VCF file with VEP from command line, with R. I tried with vcfR and ensemblVEP, but I can't find the right way to deal with the VEP.vcf file.

Thank you in advance, AY

vep ensembl R Bash • 2.9k views

ADD COMMENT • link updated 9 months ago by mimarcelape • 0 • written 4.1 years ago by yussab ▴ 100

score 1 · Answer 1 · 2021-03-17

1

Entering edit mode

4.1 years ago

prasundutta87 ▴ 710

In addition to the above answer, I would like to add that bcftools has a VEP-specific plugin which can also be used. More information can be found in: https://samtools.github.io/bcftools/howtos/plugin.split-vep.html

And of course, the resulted output file can be then easily imported in R.

ADD COMMENT • link 4.1 years ago by prasundutta87 ▴ 710

score 0 · Answer 2 · 2021-03-17

0

Entering edit mode

4.1 years ago

dariober 15k

This is not pure R solution but it may be ok. Assuming the VEP annotation is the INFO tag CSQ and it uses | as separator (the default):

bcftools query -f '%CHROM|%POS|%ID|%REF|%ALT|%QUAL|%FILTER|%INFO/CSQ\n' vep.vcf > vep.txt

Now vep.txt is a | separated table with columns CHROM, POS, etc. The meaning of the VEP columns is in the VCF header you should be able to extract it with e.g.

bcftools view -h vep.vcf | grep '##INFO=<ID=CSQ,'

To read it in R you can use read.table('vep.txt', sep= '|', ...)

ADD COMMENT • link 4.1 years ago by dariober 15k

1

Entering edit mode

Thank you dariober, however this didn't function. I can't understand why I found an higher number of "|" in my file than expected by "INFO-ID=CSQ"... Who is this possible?? Make me really confused