Question

extracting parts of vcf file

0

Entering edit mode

3.5 years ago

storm1907 ▴ 30

Hello, I have following vcf file with header and this content of columns:

chr1    10643146    .   G   GC  63.2    PASS    CSQ=|FAIL|0.00|0.00|0.01|0.00|13|40|-3|13|||MODIFIER|CASZ1|ENSG00000130940|ENST00000377022|protein_coding||19/20|||||,|FAIL|0.00|0.00|0.01|0.00|13|40|-3|13|||MODIFIER|AL139423.1|ENSG00000272078|ENST00000606802|lncRNA||1/1|||||  GT:GQ:DP:AD:VAF:PL  0/1:58:86:40,45:0.523256:63,0,59
chr1    10646034    .   G   C   64.8    PASS    CSQ=|FAIL|0.00|0.00|0.00|0.00|22|3|1|2|||MODIFIER|CASZ1|ENSG00000130940|ENST00000377022|protein_coding||17/20|||||,|FAIL|0.00|0.00|0.00|0.00|22|3|1|2|||MODIFIER|AL139423.1|ENSG00000272078|ENST00000606802|lncRNA||1/1|||||    GT:GQ:DP:AD:VAF:PL  0/1:59:27:13,14:0.518519:64,0,60

I would like to extract only gene name in first column, and chromosomal position in second column, so that my final file could like:

chr1:10643146             CASZ1

are there any options in awk, how to do that?

Thank you!

vcf plink • 959 views

ADD COMMENT • link updated 3.5 years ago by Ram 44k • written 3.5 years ago by storm1907 ▴ 30

score 2 · Answer 1 · 2021-05-19

2

Entering edit mode

3.5 years ago

Ram 44k

Take a look at bcftools query. The VCF data you show above seems to be the output of VEP, so searching online for "VEP extract fields" might yield some interesting results. See: https://samtools.github.io/bcftools/howtos/plugin.split-vep.html

ADD COMMENT • link 3.5 years ago by Ram 44k

0

Entering edit mode

this tool is not appropriate, as I got message:

The field "Consequence" is not present in INFO/CSQ: "Consequence annotations from Ensembl VEP. Format: 'Allele

This is the header of my vcf:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample.F
chr1    69270   .       A       G       55.4    PASS    CSQ=|FAIL|0.01|0.00|0.00|0.00|-10|26|-28|-25|||LOW|OR4F5|ENSG00000186092|ENST00000335137|protein_coding|1/1||216|60|S|tcA/tcG|,|FAIL|0.01|0.00|0.00|0.00|-10|26|-28|-25|||LOW|OR4F5|ENSG00000186092|ENST00000641515|protein_coding|3/3||303|81|S|tcA/tcG|    GT:GQ:DP:AD:VAF:PL      1/1:55:18:0,18:1:55,65,0

ADD REPLY • link 3.5 years ago by storm1907 ▴ 30

0

Entering edit mode

That plugin was just an example of a search result one would find. There should be other tools to parse VEP output. Plus, you may want to update that plugin so it works - that's how open source software stays relevant.

What was the result of your trials with bcftools query?

ADD REPLY • link 3.5 years ago by Ram 44k