Entering edit mode
7.1 years ago
emyli
▴
10
Hi,
I have a VCF that has been annotated using ANNOVAR. I am trying to subset the VCF based on the value contained in the INFO column field of "Gene.refGene=". I am trying to use bcftools for this (see below), however there is no mention of this field, or any fields added by ANNOVAR, in the VCF header so this command is not working. Can anyone offer any advice?
bcftools view -o test -Ov --include "INFO/Gene.refGene ~ GENE/i" emily.vcf
I've only included an example INFO field below in the interest of space.
AC=1;AF=5.882e-04;AN=1700;BaseQRankSum=-1.981e+00;ClippingRankSum=0.00;DP=60078;ExcessHet=15.2044;FS=5.441;InbreedingCoeff=-0.0379;MLEAC=1;MLEAF=5.882e-04;MQ=33.47;MQRankSum=1.98;NEGATIVE_TRAIN_SITE;QD=2.02;ReadPosRankSum=1.98;SOR=0.027;VQSLOD=-4.309e+00;culprit=DP;ANNOVAR_DATE=2017-07-17;Func.refGene=intergenic;Gene.refGene=NONE\x3bDDX11L1;GeneDetail.refGene=dist\x3dNONE\x3bdist\x3d1727;ExonicFunc.refGene=.;AAChange.refGene=.;cytoBand=1p36.33;ExAC_ALL=.;ExAC_AFR=.;ExAC_AMR=.;ExAC_EAS=.;ExAC_FIN=.;ExAC_NFE=.;ExAC_OTH=.;ExAC_SAS=.;gnomAD_exome_ALL=.;gnomAD_exome_AFR=.;gnomAD_exome_AMR=.;gnomAD_exome_ASJ=.;gnomAD_exome_EAS=.;gnomAD_exome_FIN=.;gnomAD_exome_NFE=.;gnomAD_exome_OTH=.;gnomAD_exome_SAS=.;gnomAD_genome_ALL=4.001e-05;gnomAD_genome_AFR=0;gnomAD_genome_AMR=0;gnomAD_genome_ASJ=0;gnomAD_genome_EAS=0;gnomAD_genome_FIN=0;gnomAD_genome_NFE=8.381e-05;gnomAD_genome_OTH=0;avsnp147=.;SIFT_score=.;SIFT_converted_rankscore=.;SIFT_pred=.;Polyphen2_HDIV_score=.;Polyphen2_HDIV_rankscore=.;Polyphen2_HDIV_pred=.;Polyphen2_HVAR_score=.;Polyphen2_HVAR_rankscore=.;Polyphen2_HVAR_pred=.;LRT_score=.;LRT_converted_rankscore=.;LRT_pred=.;MutationTaster_score=.;MutationTaster_converted_rankscore=.;MutationTaster_pred=.;MutationAssessor_score=.;MutationAssessor_score_rankscore=.;MutationAssessor_pred=.;FATHMM_score=.;FATHMM_converted_rankscore=.;FATHMM_pred=.;PROVEAN_score=.;PROVEAN_converted_rankscore=.;PROVEAN_pred=.;VEST3_score=.;VEST3_rankscore=.;MetaSVM_score=.;MetaSVM_rankscore=.;MetaSVM_pred=.;MetaLR_score=.;MetaLR_rankscore=.;MetaLR_pred=.;M-CAP_score=.;M-CAP_rankscore=.;M-CAP_pred=.;CADD_raw=.;CADD_raw_rankscore=.;CADD_phred=.;DANN_score=.;DANN_rankscore=.;fathmm-MKL_coding_score=.;fathmm-MKL_coding_rankscore=.;fathmm-MKL_coding_pred=.;Eigen_coding_or_noncoding=.;Eigen-raw=.;Eigen-PC-raw=.;GenoCanyon_score=.;GenoCanyon_score_rankscore=.;integrated_fitCons_score=.;integrated_fitCons_score_rankscore=.;integrated_confidence_value=.;GERP++_RS=.;GERP++_RS_rankscore=.;phyloP100way_vertebrate=.;phyloP100way_vertebrate_rankscore=.;phyloP20way_mammalian=.;phyloP20way_mammalian_rankscore=.;phastCons100way_vertebrate=.;phastCons100way_vertebrate_rankscore=.;phastCons20way_mammalian=.;phastCons20way_mammalian_rankscore=.;SiPhy_29way_logOdds=.;SiPhy_29way_logOdds_rankscore=.;Interpro_domain=.;GTEx_V6_gene=.;GTEx_V6_tissue=.;Interpro_domain=.;esp6500siv2_ea=.;esp6500siv2_all=.;ALL.sites.2015_08=.;EUR.sites.2015_08=.;CLINSIG=.;CLNDBN=.;CLNACC=.;CLNDSDB=.;CLNDSDBID=.;ALLELE_END
so , with
--include "INFO/Gene.refGene ~ GENE/i"
, you vcf line would be filtered out, becauseNONE\x3bDDX11L1
doesn't matchGENE/i
. Are you ok with this ?furthermore `x3b` is an UTF-8 symbol for ';', I wonder how bcftools handle this...
Thanks for your comments. This actually isn't a line I want to include, I just used it as an example of what the INFO column looks like. The command doesn't actually work at all, and outputs an error saying INFO/Gene.refGene is not in the VCF header, and I am trying to understand why that is.