I am doing SNP analysis on whole genome saccharomyces cerevisiae. I want to segregate the homozygote variants from the heterozygote variants. How do I go about it?
I am doing SNP analysis on whole genome saccharomyces cerevisiae. I want to segregate the homozygote variants from the heterozygote variants. How do I go about it?
Look at the GT field for the most likely genotype.
For a diploid without multi-allelic loci
grep "^#\|1\/1" snps.vcf > hom-alt.vcf
using my tool vcffilterjs
java -jar vcffilterjs.jar -e 'variant.getGenotype("SAMPLENAME").isHet()' your.vcf.gz > het.vcf
or use isHom()
or isHomVar()
or isHomRef()
or use GATK variant filtration
snpSift is a useful tool if you want to look at a multisample vcf where all samples should be homozygote (or other types of queries):
for instance for four samples:
java -jar SnpSift.jar filter "countHom() = 4 & !(GEN[0].GT='./.') & !(GEN[1].GT='./.') & !(GEN[2].GT='./.')
& !(GEN[3].GT='./.')" -f my.vcf
It's unfortunate that it considers a no call (./.
) also as a homozygote, but the !(GEN[0].GT='./.')
... part deals with that.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I tried this, but I did find "0/1" still present in the vcf.
Show us your cmd-line. Is there only one sample? Can you find the string
1/1
elsewhere?