How To Get Count Of 0|0 From A Gvcf
2
1
Entering edit mode
10.8 years ago
win ▴ 990

hi all, I have a gVCF file and i want to know which calls are 0|0 or wildtype homozygous. any ideas how this can be computed?

thanks.

vcf • 3.6k views
ADD COMMENT
1
Entering edit mode

Re-call the variants jointly. If you do not have the BAM files you are out of luck. If you have the BAMs and don't want to re-call the variants you can check depth for each individual.

ADD REPLY
3
Entering edit mode
10.8 years ago
pd3 ▴ 350

I don't have experience with this myself, but chaining commands such 'bcftools query' and 'awk' might do it:

bcftools query -f'%ALT\t%END\t%POS\n' <file> | awk '{if($1=="." && $2!=".")n+=($2-$3+1)}END{print n}'

The assumption here is that REF records in gVCF have ALT=. and the INFO/END annotation set. The command will print the total length of REF-only stretches. Depending on what your file looks like, it may need a bit more work.

Hope this helps

ADD COMMENT
1
Entering edit mode
10.8 years ago

Using the latest gvcftools, you can run:

gzip -dc genome.vcf | ${GVCFTOOLS_BIN}/extract_variants --invert | ${GVCFTOOLS_BIN}/get_called_regions >| homref_regions.bed
ADD COMMENT

Login before adding your answer.

Traffic: 2275 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6