Hello,
I have a vcf file that contains 200 DO mouse samples. I want to filter the file by SNPs that have at least 5 of each genotype per SNP. Each SNP needs at least 5 AA, AB, and BB. For example, if a SNP has190 AA, 6 AB, and 4 BB then this would be discarded. Or if there are 100 AA, 0 AB, and 100 BB, then this will also be discarded. There needs to be 5 or more for each genotype. How would I go about doing this? I have been trying with vcftools, but not quite getting it to work. The rule doesn't have to be exact, I am just trying to filter SNPs that can give me the most information from telling cell lines apart.
Any help would be greatly appreciated.
Thank you