Hi, I have a list of vcf files, one per each individual with the variants called for HLA genes. HLA has a lot of multi allelic SNPs but in this case I need to filter out only the biallelic SNPs, scanning through all vcf files. Is there any specific tool for this?
Ex: sample 1 rs-xx G C sample 1 rs-yy T C sample 2 rs-xx A C sample 2 rs-yy TC. In this case I want to get only rs-yy as the result.
Thanks a lot in advance.
Thanks a lot. Is merging over 500 samples feasible?
Yes, although htslib is likely to be much faster for such a task.