BCF Tools Filter on 1000Genomes Annotation
0
2
Entering edit mode
5.6 years ago
jon.klonowski ▴ 210

I am trying to filter a VCF based on the allele frequency as given by ANNOVAR annotation, which has annotations in the info column. I am doing this because I only want the rare variants for analysis. I have been trying bcf query but it is not acting well behaved....

$ bcftools view -i 'INFO/controls_AF_popmax < 0.05' File_Name.vcf.gz  | bgzip > output.vcf.gz

or

$ bcftools query File_Name.vcf.gz -f '%ID\t %INFO/Gene.ensGene\t %INFO/controls_AF_popmax\n' -i 'INFO/controls_AF_popmax < 0.05' >  output.vcf

I get output but not wat expected in terms of the filter.

I want a vcf in the end so I can make ped, map, bed, bim, fam from it using PLINK (I know how to do that part).

UPDATE: I thoguht maybe it was because of the format of the INFO/controls_AF_popmax field in the header. I tried to change the designation to the same designation AF is but that did not work either.

UPDATE: It works for 'INFO/AF > 0.05' but not for 'INFO/controls_AF_popmax > 0.05' do not know why

bcftools vcftools annotation filter • 2.4k views
ADD COMMENT
0
Entering edit mode

Hello,

could you please show us the complete header and some variants of your input file?

fin swimmer

ADD REPLY
1
Entering edit mode

Looks like there were numerous extra info fields specified in the header that were not actually present. I just purged them, going to double check the header and try this all again.

UPDATE:

AFTER UPDATING INFO IT THE IT APPEARS TO BE WORKING. Lets see

what I changed:

  1. removed superfluous headers
  2. Changed format of the Annotation of interested to Float from String:

## INFO=<ID=controls_AF_popmax,Number=.,Type=Float,Description="controls_AF_popmax annotation provided by ANNOVAR">
ADD REPLY
0
Entering edit mode

Hi

I had same problem on my vcf.

I also wanted to filter a VCF based on the allele frequency as given by ANNOVAR annotation. Like you said, I've changed my vcf INFO header (from "String" to "Float", only on site which I wanted to filter) and bcftools query (only) worked fine. BTW... excellent comment! Thank you very much!

From:

##INFO=<ID=ExAC_EAS,Number=.,Type=String,Description="ExAC_EAS annotation provided by ANNOVAR">

To:

##INFO=<ID=ExAC_EAS,Number=.,Type=Float,Description="ExAC_EAS annotation provided by ANNOVAR">

Command:

bcftools query input_edited.vcf -f '%CHROM\t%POS\t%ID\t%REF\t%ALT\t%INFO/ExAC_EAS\n' -i 'INFO/ExAC_EAS != "." & INFO/ExAC_EAS < 0.005'

With this command, my output is tabular.

I also tried bcftools view (to get a vcf output) and didn't work:

bcftools view -e 'ExAC_EAS="." & ExAC_EAS > 0.005' input_edited.vcf

I had a lot of errors like this: [W::vcf_parse_format]

I did some tests with 2 versions of vcf: before and after this modification. The error is still there in both cases.

I think the steps here are: use bcftools query, get ID's from variants of interest (in tabular format), use these ID's as query again against my original vcf to get variants in VCF format.

ADD REPLY
0
Entering edit mode

UPDATED VCF header where i deleted all the contig information and filtering information so it would fit in a single screenshot:

PREVIOUS VCF HEADER https://ibb.co/rZRtR7F

UPDATED VCF HEADER https://ibb.co/n8cWxm9

Sorry i am not sure the best way to share...

and here is a record, not including sample information:

chr1    861219  chr1:861219:G:C G   C   490.79  PASS    AC=0;AF=0.0007669;AN=922;BaseQRankSum=2.72;DP=18855;ExcessHet=3.0103;FS=3.736;InbreedingCoeff=-0.0011;MLEAC=1;MLEAF=0.0007669;MQ=60;MQRankSum=0;QD=18.88;ReadPosRankSum=0.416;SOR=2.247;VQSLOD=13.64;culprit=MQRankSum;ANNOVAR_DATE=2018-04-16;Func.refGene=intronic;Gene.refGene=SAMD11;GeneDetail.refGene=.;ExonicFunc.refGene=.;AAChange.refGene=.;Func.ensGene=intronic;Gene.ensGene=ENSG00000187634;GeneDetail.ensGene=.;ExonicFunc.ensGene=.;AAChange.ensGene=.;AF=0.0001;AF_popmax=0.0004;AF_male=0.0001;AF_female=0.0001;AF_raw=0.0001;AF_afr=0;AF_sas=0.0004;AF_amr=0;AF_eas=0;AF_nfe=0.0001;AF_fin=0;AF_asj=0.0002;AF_oth=0;non_topmed_AF_popmax=0.0004;non_neuro_AF_popmax=0.0004;non_cancer_AF_popmax=0.0004;controls_AF_popmax=0.0003;MCAP13=.;CADD13_RawScore=.;CADD13_PHRED=.;gerp++elem=.;CLNALLELEID=.;CLNDN=.;CLNDISDB=.;CLNREVSTAT=.;CLNSIG=.;ALLELE_END GT:AD:DP:GQ:PL
ADD REPLY
0
Entering edit mode

i will try to clean up my header completely and double check the positioning, maybe the annovar annotation didnt properly align the header and that is why i am getting the issue

ADD REPLY

Login before adding your answer.

Traffic: 1954 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6