How to deal with conflicting INFO field names in SnpSift?
0
0
Entering edit mode
2.4 years ago
bdolin ▴ 100

I'm somewhat new to SnpSift, looking to annotate a VCF file with gnomAD population frequencies.

My VCF file has an existing INFO.AF field

##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">

and gnomAD VCF also defines an INFO.AF field (with a different meaning)

##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate allele frequency in samples">

It looks like SnpSift's default behavior is to override an existing INFO.AF where there is a match to gnomAD - leaving me with a mix of INFO.AF fields.

Do I need to remove the existing INFO.AF field first, or is there a way, for instance, to rename the gnomAD annotation as I pull it in?

This is the command I'm running:

java -jar SnpSift.jar annotate gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.bgz -info "AF" myVCF.vcf > myAnnotatedVCF.vcf

Thanks

SnpSift snpEff • 1.2k views
ADD COMMENT
0
Entering edit mode

and gnomAD VCF also defines an INFO.AF field (with a different meaning)

from my point of view, they do have the very same meaning

ADD REPLY
0
Entering edit mode

https://samtools.github.io/hts-specs/VCFv4.2.pdf

. Arbitrary keys are permitted, although the following sub-fields are reserved (albeit optional):

AF : allele frequency for each ALT allele in the same order as listed

ADD REPLY
0
Entering edit mode

Unfortunately not. In one case, AF is the sample read frequency for the allele, and in the gnomAD case, AF is the population frequency for the allele.

ADD REPLY
0
Entering edit mode

I see what you are saying - both have a same definition. But here's the actual output. This record was NOT found in gnomAD, and so the original INFO.AF is retained, as the sample read frequency:

chr1    187485  rs1423991279    G   A   242.77  PASS    AC=1;AF=0.500;AN=2;BaseQRankSum=2.67;DB;DP=21;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQRankSum=-1.230e+00;QD=11.56;ReadPosRankSum=0.284;SOR=1.112;VQSLOD=2.47;culprit=FS GT:AD:DP:GQ:PL  0/1:9,12:21:99:271,0,175

Whereas this record WAS found in gnomAD, so INFO.AF was revised by SnpSift, but in this case is the population frequency:

chr1    942451  rs6672356   T   C   182.84  PASS    AC=2;AN=2;DB;DP=7;ExcessHet=3.0103;FS=0.000;MLEAC=2;MLEAF=1.00;QD=30.47;SOR=1.329;VQSLOD=1.87;culprit=DP;AF=9.99870e-01 GT:AD:DP:GQ:PL  1/1:0,6:6:18:211,18,0
ADD REPLY
0
Entering edit mode

Ah I see ! SnpSift is doing a little wrong here. You could use bcftools annotate --rename-annot to rename the field AF before/after snpSift:

    AF->old_AF
   (snpsift)
    AF->gnomad_AF
    old_AF->AF
ADD REPLY
0
Entering edit mode

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6