Subsetting VCF annotated by ANNOVAR
0
0
Entering edit mode
7.1 years ago
emyli ▴ 10

Hi,

I have a VCF that has been annotated using ANNOVAR. I am trying to subset the VCF based on the value contained in the INFO column field of "Gene.refGene=". I am trying to use bcftools for this (see below), however there is no mention of this field, or any fields added by ANNOVAR, in the VCF header so this command is not working. Can anyone offer any advice?

bcftools view -o test -Ov --include "INFO/Gene.refGene ~ GENE/i" emily.vcf

I've only included an example INFO field below in the interest of space.

AC=1;AF=5.882e-04;AN=1700;BaseQRankSum=-1.981e+00;ClippingRankSum=0.00;DP=60078;ExcessHet=15.2044;FS=5.441;InbreedingCoeff=-0.0379;MLEAC=1;MLEAF=5.882e-04;MQ=33.47;MQRankSum=1.98;NEGATIVE_TRAIN_SITE;QD=2.02;ReadPosRankSum=1.98;SOR=0.027;VQSLOD=-4.309e+00;culprit=DP;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=NONE\x3bDDX11L1;GeneDetail.refGene=dist\x3dNONE\x3bdist\x3d1727;ExonicFunc.refGene=.;AAChange.refGene=.;cytoBand=1p36.33;ExAC_ALL=.;ExAC_AFR=.;ExAC_AMR=.;ExAC_EAS=.;ExAC_FIN=.;ExAC_NFE=.;ExAC_OTH=.;ExAC_SAS=.;gnomAD_exome_ALL=.;gnomAD_exome_AFR=.;gnomAD_exome_AMR=.;gnomAD_exome_ASJ=.;gnomAD_exome_EAS=.;gnomAD_exome_FIN=.;gnomAD_exome_NFE=.;gnomAD_exome_OTH=.;gnomAD_exome_SAS=.;gnomAD_genome_ALL=4.001e-05;gnomAD_genome_AFR=0;gnomAD_genome_AMR=0;gnomAD_genome_ASJ=0;gnomAD_genome_EAS=0;gnomAD_genome_FIN=0;gnomAD_genome_NFE=8.381e-05;gnomAD_genome_OTH=0;avsnp147=.;SIFT_score=.;SIFT_converted_rankscore=.;SIFT_pred=.;Polyphen2_HDIV_score=.;Polyphen2_HDIV_rankscore=.;Polyphen2_HDIV_pred=.;Polyphen2_HVAR_score=.;Polyphen2_HVAR_rankscore=.;Polyphen2_HVAR_pred=.;LRT_score=.;LRT_converted_rankscore=.;LRT_pred=.;MutationTaster_score=.;MutationTaster_converted_rankscore=.;MutationTaster_pred=.;MutationAssessor_score=.;MutationAssessor_score_rankscore=.;MutationAssessor_pred=.;FATHMM_score=.;FATHMM_converted_rankscore=.;FATHMM_pred=.;PROVEAN_score=.;PROVEAN_converted_rankscore=.;PROVEAN_pred=.;VEST3_score=.;VEST3_rankscore=.;MetaSVM_score=.;MetaSVM_rankscore=.;MetaSVM_pred=.;MetaLR_score=.;MetaLR_rankscore=.;MetaLR_pred=.;M-CAP_score=.;M-CAP_rankscore=.;M-CAP_pred=.;CADD_raw=.;CADD_raw_rankscore=.;CADD_phred=.;DANN_score=.;DANN_rankscore=.;fathmm-MKL_coding_score=.;fathmm-MKL_coding_rankscore=.;fathmm-MKL_coding_pred=.;Eigen_coding_or_noncoding=.;Eigen-raw=.;Eigen-PC-raw=.;GenoCanyon_score=.;GenoCanyon_score_rankscore=.;integrated_fitCons_score=.;integrated_fitCons_score_rankscore=.;integrated_confidence_value=.;GERP++_RS=.;GERP++_RS_rankscore=.;phyloP100way_vertebrate=.;phyloP100way_vertebrate_rankscore=.;phyloP20way_mammalian=.;phyloP20way_mammalian_rankscore=.;phastCons100way_vertebrate=.;phastCons100way_vertebrate_rankscore=.;phastCons20way_mammalian=.;phastCons20way_mammalian_rankscore=.;SiPhy_29way_logOdds=.;SiPhy_29way_logOdds_rankscore=.;Interpro_domain=.;GTEx_V6_gene=.;GTEx_V6_tissue=.;Interpro_domain=.;esp6500siv2_ea=.;esp6500siv2_all=.;ALL.sites.2015_08=.;EUR.sites.2015_08=.;CLINSIG=.;CLNDBN=.;CLNACC=.;CLNDSDB=.;CLNDSDBID=.;ALLELE_END
annovar vcf bcftools • 3.1k views
ADD COMMENT
0
Entering edit mode

so , with --include "INFO/Gene.refGene ~ GENE/i", you vcf line would be filtered out, because NONE\x3bDDX11L1 doesn't match GENE/i. Are you ok with this ?

ADD REPLY
0
Entering edit mode

furthermore `x3b` is an UTF-8 symbol for ';', I wonder how bcftools handle this...

ADD REPLY
0
Entering edit mode

Thanks for your comments. This actually isn't a line I want to include, I just used it as an example of what the INFO column looks like. The command doesn't actually work at all, and outputs an error saying INFO/Gene.refGene is not in the VCF header, and I am trying to understand why that is.

ADD REPLY

Login before adding your answer.

Traffic: 2728 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6