Remove homozygous reference SNPS present in ALL samples of multi-sample VCF file
0
0
Entering edit mode
5.0 years ago
VBer ▴ 200

Hello.

I did I subset of 50 samples of a 192 sample VCF file. Some of the SNPs present in the new subset VCF are not present (i.e are 0/0) in all 50 samples. I would like to remove them, preferably using an existing tool.

I tried bcftools -e 'GT[0-49]="RR"' but that removes SNPs when even one sample is 0/0.

Thanks.

vcf • 1.7k views
ADD COMMENT
1
Entering edit mode

Take a look at my answer, here: A: How to get sample names and genotype for SNP in multi-sample VCF file

This will help you to identify sites that are completely homozygous reference (nHomRef column). You could then take those IDs and use them to filter the original data. Otherwise, indeed, there is likely some query that you can do with BCFtools or SnpSift.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Hey Aisha, thank you. Yes I did see that post earlier. I tried the bcftools suggestion and it didn't work. I have trouble installing Pierre's vcffilterjdk. I am yet to try SnpSift.

ADD REPLY

Login before adding your answer.

Traffic: 2718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6