Hi,
I'm trying to filter for opposite genotype from multisample vcf (8 samples) which consist of 2 population (4 samples/population) while maintain the vcf format.
What I want to achieved are as follow:
1. Filter for same genotype in all sample in Population A while genotype in Population B are totally different than Population A.
Example: 1/1
in all sample in Population A (s1-s4), none 1/1
in Population B (a1-a4) (genotype in Population B can be a mixture of 0/0 and 0/1).
Desired genotype output
s1 s2 s3 s4 a1 a2 a3 a4
1/1 1/1 1/1 1/1 0/0 0/0 0/0 0/0
1/1 1/1 1/1 1/1 0/0 0/0 0/0 0/1
..
2. Flip the genotype filtering as in (1) where genotype in Population B are same while genotype in Population A is not same as Population B.
Desired genotype output
s1 s2 s3 s4 a1 a2 a3 a4
0/0 0/0 0/0 0/0 1/1 1/1 1/1 1/1
0/0 0/0 0/0 0/1 1/1 1/1 1/1 1/1
..
I tried with SnpSift, however the filter missed 26 combination when involve 1/1 genotype.
The command I used as follow:
cat sample.vcf | \
java -Xmx4g -jar SnpSift.jar filter \
"(
((countHom()>3) & (countHom()<5)) |
(countHet()=4) |
(countRef()=4)) &
(
(isRef(GEN[0]) & isRef(GEN[1]) & isRef(GEN[2]) & isRef(GEN[3])
) |
(isHom(GEN[0]) & isVariant(GEN[0]) & isHom(GEN[1])& isVariant(GEN[1]) & isHom(GEN[2]) & isVariant(GEN[2])& isHom(GEN[3]) & isVariant(GEN[3])) |
(isHet(GEN[0]) & isHet(GEN[1]) & isHet(GEN[2]) & isHet(GEN[3]))| (isRef(GEN[4]) & isRef(GEN[5]) & isRef(GEN[6]) & isRef(GEN[7])) |
(isHom(GEN[4]) & isVariant(GEN[4]) & isHom(GEN[5])& isVariant(GEN[5]) & isHom(GEN[6]) & isVariant(GEN[6])& isHom(GEN[7]) & isVariant(GEN[7])) |
(isHet(GEN[4]) & isHet(GEN[5]) & isHet(GEN[6]) & isHet(GEN[7]))
)"
Does anyone has experience in this filtering? I appreciate any help/advise.
Many thanks
Thanks for all the helps & advise.
Script for the opposite filtering with https://github.com/lindenb/jvarkit/wiki/VCFFilterJS: