Entering edit mode
5.0 years ago
VBer
▴
200
Hello.
I did I subset of 50 samples of a 192 sample VCF file. Some of the SNPs present in the new subset VCF are not present (i.e are 0/0) in all 50 samples. I would like to remove them, preferably using an existing tool.
I tried bcftools -e 'GT[0-49]="RR"' but that removes SNPs when even one sample is 0/0.
Thanks.
Take a look at my answer, here: A: How to get sample names and genotype for SNP in multi-sample VCF file
This will help you to identify sites that are completely homozygous reference (
nHomRef
column). You could then take those IDs and use them to filter the original data. Otherwise, indeed, there is likely some query that you can do with BCFtools or SnpSift.Have a look on this post.
Hey Aisha, thank you. Yes I did see that post earlier. I tried the bcftools suggestion and it didn't work. I have trouble installing Pierre's vcffilterjdk. I am yet to try SnpSift.