vcf file filter
2
1
Entering edit mode
7.1 years ago
ATCG ▴ 400

I need to filter my vcf file to include variants with at least 30 individuals in each of the possible groups: major allele homozygous, heterozygous, and minor allele homozygous; would be grateful for any input. Thanks!

filter vcf SNPs minor allele homozygotes • 2.6k views
ADD COMMENT
0
Entering edit mode
7.1 years ago
tpoterba ▴ 50

You can do this in Hail. Here's python code that should work with the 0.1 interface:

from hail import *
hc = HailContext()

vds = hc.import_vcf('my.vcf.bgz')\
        .split_multi()\ # split multiallelics, because variant_qc requires biallelics
        .variant_qc()   # conveniently produces the metrics you want

# remove sites that fail your criteria
vds = vds.filter_variants_expr('va.qc.nHomVar >= 30 && va.qc.nHet >= 30 && va.qc.nHomRef >= 30')

# export to VCF
vds.export_vcf('filtered.vcf.bgz')

Alternatively, continue downstream analysis in Hail!

ADD COMMENT
0
Entering edit mode

I am getting the following error: NameError Traceback (most recent call last) <ipython-input-21-2ace1e95a049> in <module>() 1 ----> 2 vds = hc.import_vcf('x.clean.tidy.vcf')

NameError: name 'hc' is not defined

In [22]:

ADD REPLY
0
Entering edit mode
7.1 years ago

using vcffilterjdk : http://lindenb.github.io/jvarkit/VcfFilterJdk.html

java -jar dist/vcffilterjdk.jar -e 'Map<GenotypeType,Long> map=variant.getGenotypes().stream().map(G->G.getType()).collect(Collectors.groupingBy(Function.identity(), Collectors.counting())) ; return map.containsKey(GenotypeType.HET) && map.get(GenotypeType.HET)>30 &&  map.containsKey(GenotypeType.HOM_REF) && map.get(GenotypeType.HOM_REF)>30  &&  map.containsKey(GenotypeType.HOM_VAR) && map.get(GenotypeType.HOM_VAR)>30 ;' input.vcf

and using the recent version of vcffilterjdk, it's even shorter:

java -jar dist/vcffilterjdk.jar  -e 'com.github.lindenb.jvarkit.util.Counter<String> h = new com.github.lindenb.jvarkit.util.Counter(variant.getGenotypes().stream().map(G->G.getType().name()));  return h.count("HOM_REF")>30 && h.count("HOM_VAR")>30 && h.count("HET")>30;'  input.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1234 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6