Question

Extracting certain columns from VCF file

3

Entering edit mode

6.5 years ago

gradstudentNew ▴ 50

Hello all,

I've been recently trying to extract only certain columns with vcftools of an annovar-run VCF file. I did the following command:

vcftools --vcf file_ANNOVAR.vcf --recode-INFO ExAC_SAS_AF --recode-INFO rs_dbSNP147 --out OUTPUT.vcf

but it unfortunately isn't working. Does any one have any tips on what else I could try? I don't know what the column # is because the file is too big to open on my computer (I'm doing everything via SSH).

vcf genotype vcftools exome • 9.1k views

ADD COMMENT • link updated 20 months ago by Ram 44k • written 6.5 years ago by gradstudentNew ▴ 50

1

Entering edit mode

if you want GUI based program this is the one to use

ADD REPLY • link 6.5 years ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

Please post input vcf (with headers and few example records) and the columns you want to extract @OP

ADD REPLY • link 6.5 years ago by cpad0112 21k

1

Entering edit mode

Hey guys, I ended up using some perl scripting to fix my issue. I realized that everything was being printed in the 9th column i.e. Exac|gnomad|..|..| so I ended up spliting that column and then pasting / joining the ones I needed. :) Thank you all for the help!

ADD REPLY • link 6.5 years ago by gradstudentNew ▴ 50

0

Entering edit mode

You're welcome dude

ADD REPLY • link 6.5 years ago by Kevin Blighe 88k

GenoMax · Answer 1 · 2018-05-27

6

Entering edit mode

6.5 years ago

Kevin Blighe 88k

You need to switch from VCFtools to BCFTools, in partcular, bcftools query.

It looks like you not only want certain columns but also certain key-value pairs within the primary VCF columns, which are tab-delimited.

Here are examples that will assist you from one of my own VCFs:

bcftools query -f'[%CHROM:%POS %GT\n]' 2701.snvindel.var.vcf.gz | head -5
1:69511 1/1
1:69761 0/1
1:752721 0/1
1:752894 1/1
1:762273 0/1

.

bcftools query -f'[%CHROM:%POS:%REF:%ALT %SAMPLE %GT\n]' 2701.snvindel.var.vcf.gz | head -5
1:69511:A:G 2701 1/1
1:69761:A:T 2701 0/1
1:752721:A:G 2701 0/1
1:752894:T:C 2701 1/1
1:762273:G:A 2701 0/1

Should be fairly obvious what those are doing. To extract certain values from the INFO column, which is what you appear to have to do, you can do the following:

bcftools query -f'[%CHROM:%POS:%REF:%ALT %INFO/HaplotypeScore:%INFO/VQSLOD %SAMPLE %GT\n]' 2701.snvindel.var.vcf.gz | head -5
1:69511:A:G 0.9159:-6.231 2701 1/1
1:69761:A:T 0:-9.034 2701 0/1
1:752721:A:G 0:-1.447 2701 0/1
1:752894:T:C 0:-6.798 2701 1/1
1:762273:G:A 5.3647:-2.236 2701 0/1

Here, HaplotypeScore and VQSLOD are tags define din my INFO field.

Kevin

ADD COMMENT • link 6.5 years ago by Kevin Blighe 88k

0

Entering edit mode

I'm really new to bioinformatics, so thank you so much for your help! I tried doing that and it said that the column(s) didn't exist. I'm not sure whether it's because of how my VCF file is formatted? My info header looks like this:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature

Do you think the "|" is affecting anything?

ADD REPLY • link updated 6.5 years ago by GenoMax 147k • written 6.5 years ago by gradstudentNew ▴ 50

0

Entering edit mode

Ah! In this case, the key value is called Description (%INFO/Description), so, bcftools query will only be able to extract the entire string that contains all of your annotation.

You can still, nevertheless, do that and then do some post filtering with cut, sed, awk, or other commands. How is your experience with these commands?

ANNOVAR can output in CSV format, by the way. That would be much easier for you, surely?

ADD REPLY • link 6.5 years ago by Kevin Blighe 88k

0

Entering edit mode

Hi, I would need to extract variants with gnomAD_AF information from CSQ field. With bcftools query it returns only dots even though I can manually check with less that there are values for gnomAD_AF.. I would really appreciate help!

bcftools query -f '%CHROM %POS %INFO/gnomAD_AF\n' FILE | head -3
1 877831 .
1 949608 .
1 977156 .