Tools to filter the heterozygous SNPs with biased allele depth (AD) values in vcf file.
2
0
Entering edit mode
4.9 years ago
rimgubaev ▴ 340

Hello everyone! There is the following problem in my vcf file produced by GATK: a large proportion of heterozygous sites has a bias in allele depth values (AD). For example, AD is 43,7 which means that the 43 reads support one allele while 7 support another. In order to avoid possible false calls in GWAS, I want to switch such SNPs to no-calls (./.).

I know that it is possible to write a heavy-duty R or Python script substituting the heterozygous calls (0/1) for no-calls (./.) in case of significant AD bias, let's say when minor allele to major allele ratio is greater than 0.3. However, I wonder if there are ready-to-go solutions. I tried to find such a filter in GATK but failed.

In case if you faced a similar problem or know about the ready solution please share!

vcf GATK heterozygous SNPs Allele depth AD • 3.3k views
ADD COMMENT
2
Entering edit mode
4.9 years ago

using vcffilterjdk: http://lindenb.github.io/jvarkit/VcfFilterJdk.html

 java -jar dist/vcffilterjdk.jar -rc -e 'return new VariantContextBuilder(variant).genotypes( variant.getGenotypes().stream().map(G->{ if(!G.hasAD() || !G.isHet()) return G; final int ad[]=G.getAD(); if(ad.length!=2) return G; final double treshold = 0.3; final double R= ad[0]; final double A= ad[1]; final double sum = R+A; if(sum == 0 || A/sum < treshold || (1.0 - A/sum) < treshold) { return GenotypeBuilder.createMissing(G.getSampleName(),G.getPloidy()); } return G; }).collect(Collectors.toList()) ).make();'  input.vcf
ADD COMMENT
0
Entering edit mode

Hi Pierre,

Can you clarify how the above script relates to allele balance (i.e., the proportion of reads corresponding to the reference/alternate allele), across heterozygotes? I would like to change any heterozygous call with AB<0.25 & AB>0.75 to a no-call (./.). The vcf file I am working with does not have the AB field but it does have AD.

Thank you!

ADD REPLY
0
Entering edit mode

Nice Code. Thank you! When do you expect ad.length!=2 to be true? in other words, in which case the allele depth of either reference or alternative is missing? Or in which case are there more than those two alle depth in the ad vector?

ADD REPLY
0
Entering edit mode

Or in which case are there more than those two alle depth in the ad vector?

multiallelic variants.

ADD REPLY
1
Entering edit mode
4.9 years ago

VCFtools '--freq' flag will output allele frequencies; you can then parse those values using 'awk' to replace the genotype call.

ADD COMMENT

Login before adding your answer.

Traffic: 1580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6