Filtering VCF files with bcftools
1
0
Entering edit mode
9 months ago
eleni.psar • 0

Hello everyone,

From an annotated vcf file generated with annovar, I try to extract the variants that are exonic, splicing, that does not lead to synonymous mutation and have gnomad allele frequency lower than 0.01. I want to save these variants in a txt file. In order to do that I use this command line:

bcftools query \
  -f '%CHROM\t%POS\t%ID\t%REF\t%ALT\t%INFO/Func.refGene\t%INFO/Gene.refGene\t%INFO/GeneDetail.refGene\t%INFO/ExonicFunc.refGene\t%INFO/AAChange.refGene\t%INFO/gnomad40_genome_AF\n' \
  -i '(INFO/Func.refGene="exonic" || INFO/Func.refGene="exonic;splicing" || INFO/Func.refGene="splicing") && INFO/ExonicFunc.refGene!="synonymous" && INFO/gnomad40_genome_AF<0.01' \
  input.vcf > ouput.txt

I get this Error: cannot use arithmetic operators to compare strings and numbers

Does anyone now how I can change my command line in order to extract variants with gnomad allele frequency lower than 0.01.

Thank you in advance!

VCF vcftools bcftools • 590 views
ADD COMMENT
1
Entering edit mode
9 months ago
vinayjrao ▴ 250

Hi,

I also do this, but since I didn't have much time to look into tools, I opted for a basic shell command approach, which is slightly more complicated, but I'm trying to keep it as simple as possible here.

What I do is convert the ANNOVAR csv output into tab (you could also directly ask for a tab-delimited output, but I made the script when I was still using wANNOVAR).

Next, you grep the header by using grep -w Chr filename.txt > new_file.txt

Then try grep -v "^synonymous$" filename.txt >> new_file.txt. This removes all synonymous variants.

Edit: This is assuming you don't have the term "synonymous" in any other columns in the file. Else you could use awk as detailed in the next command to remove the synonymous variants.

Finally go for awk -F'\t' '($11=="." || $11<=0.01 || $11=="gnomad_allele_frequency") new_file.txt > final_file.txt. In this step, please ensure you change the column number from $11 to whatever column gnomad_allele_frequency is in, and that will give you the desired results.

Please note: $11=="." gives you novel variants (in case you need those too)

Hope this helps :)

ADD COMMENT

Login before adding your answer.

Traffic: 1449 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6