Entering edit mode
4.6 years ago
kadamek49
•
0
Hello,
I have a vcf file with 3 samples and would like to filter genotype(GT) variants that are the same across all 3 samples (Ex. 0/1 0/1 0/1). I am looking for differences between the 3 samples and want only variants that are different between the 3 samples (Ex. 0/0 0/0 0/1). Most of my genotypes are heterozygous. Does anyone have suggestions on how to do this?
Thank you!
Definitely not an ideal solution and only applies to the case where there are only 3 samples and no phasing in the vcf.
perl -lane '{if($_ =~ /^#/){print }else{my %geno=map{[split /:/,$_]->[0]=>0 } @F[-3 .. -1];if(scalar keys %geno != 1){print } } }' test.vcf
It splits and takes the last 3 columns then gets the genotype in a hash. If the number of keys in the hash is 1 then all genotypes are the same.
Thank you! This worked and accomplished what I was asking. Really appreciate it!