filter vcf file heterozygous calls by allelic depth
1
1
Entering edit mode
8.2 years ago
molly.dunn ▴ 10

I'm wondering if anyone knows how to filter heterozygous genotype calls from a vcf file where, for either allele, there is only one read supporting the call. So the allelic depth would be n,1 or 1,n. n meaning the depth could be anything. I'm thinking that we are getting a lot of wrong heterozygous calls due to sequencing error. I tried vcffilter using the command:

vcffilter -g "! ( GT = 0/1 & AD = 1,* OR AD = *,1 )" 

No luck.

Any one else have any thoughts?

Thanks!

vcf SNP next-gen alignment • 8.2k views
ADD COMMENT
0
Entering edit mode

But then, why do you have such a vcf format? Shouldn't you have the depth for each individual allele?

ADD REPLY
0
Entering edit mode

yes I do. The genotype fields are in the following format.

GT:PL:DP:SP:AD:GQ

So a typical genotype data field would look like this:

0/1:20,0,93:4:0:3,1:23

In this case, samtools is calling the position heterozygous for sample because there are 3 reads for the reference allele and one for the variant. I'd like to convert these calls to no call (./.) if there is just one reads supporting either allele.

I hope this makes sense. Thanks again for any help.

ADD REPLY
1
Entering edit mode
8.2 years ago
igor 13k

I have not used vcffilter, but bcftools filter allows array subscripts (such as (DP4[0]+DP4[1])/(DP4[2]+DP4[3]) > 0.3), which I think is what you are looking for.

bcftools filter docs: https://samtools.github.io/bcftools/bcftools.html#filter

You have to go to the EXPRESSIONS section for various examples: https://samtools.github.io/bcftools/bcftools.html#expressions

ADD COMMENT
0
Entering edit mode

Thanks so much for your help. I've tried it a bunch of ways with bcftools and can't get it to work. Looks like it doesn't want to filter at the genotype level but at the variant level.

I tried to make it simple and just filter out all hets first.

I keep getting errors like this:

Error: the tag "INFO/[%GT" is not defined in the VCF header

The command I used was:

bcftools filter -e [%GT=0/1] in.vcf > out.vcf

Thanks again.

ADD REPLY
0
Entering edit mode

Check the example command in the docs:

bcftools view -i '%ID!="." & MAF[0]<0.01'

If you follow that format, your command should be:

bcftools filter -e 'GT=0/1' in.vcf > out.vcf
ADD REPLY
0
Entering edit mode

Thanks Igor for your help. I tried: bcftools filter -e 'GT=0/1' and it didn't do anything. The I tried: bcftools filter -e '"GT=0/1"' and it got rid of all variants where there were any het calls. Then I tried: bcftools filter -e 'FMT/GT="0/1"' and it also got rid of all variants where there was any het calls. So getting the logic to work with just the heterozygous calls isn't really working much less pairing it with the 1,n or n,1 allelic depth.

Thanks for your help. Might just need to write a perl or python script

ADD REPLY

Login before adding your answer.

Traffic: 1810 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6