How do I filter a multi-individual BCF file for genotype probabilities
1
0
Entering edit mode
3.5 years ago
devenvyas ▴ 760

I have BCFs with over a hundred individuals. I want to filter the files so that any genotype call with a max(GP) < 0.9 is removed. I don't want the whole site removed, I just want that individual genotype data removed for that site. I can't figure out how to do this without doing on each BCF individually.

Any suggestions?

vcf bcf • 2.5k views
ADD COMMENT
1
Entering edit mode

I believe that when you say you want to remove genotype data, you mean that you want to make it missing. BCFtools filter can help you with that. You can try using this command (try using the latest version from Github):

bcftools filter -i 'FMT/GP>0.9' --set-GTs . <input BCF file> > <output BCF file>

This would include all genotypes that have a GP > 0.9 and covert others to missing. This rule is applied to all individuals. Furthermore, you could also try filtering based on Genotype quality (GQ) which is phred scaled.

ADD REPLY
0
Entering edit mode

This does not work.

bcftools filter -i 'FMT/GP[1-1699]>=0.9' --set-GTs . Seventh_imputations_1240K.vcf.gz still outputs problems like this 0|0:0.669,0.327,0.004

ADD REPLY
0
Entering edit mode

I don't want the whole site removed, I just want that individual genotype data removed for that site

it's not clear to me. Give us a short example of input/output.

ADD REPLY
0
Entering edit mode
3.5 years ago
4galaxy77 2.9k

bcftools +setGT test.vcf -- -t q -n . -e'FORMAT/GP>=0.90'

This should do what you need.

ADD COMMENT
0
Entering edit mode

This does not.

I enter bcftools +setGT Seventh_imputations_1240K.vcf.gz -- -t q -n . -e'FORMAT/GP>=0.90'

I get Error: FORMAT vectors must be subscripted, e.g. GP[0] or GP[*]

Even if I subscript it [1-1699], I still find stuff like 0|0:0.53,0.373,0.097 and 1|0:0.107,0.891,0.002

ADD REPLY

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6