Dear all,
I have a multi-sample VCF file from which i need to extract shared private variants(variants shared only a specific group) . I have used the below command to find shared variants.
java -Xmx4g -jar GenomeAnalysisTK.jar -T SelectVariants -R ref.fa --variant goodAA.final.vcf -select 'vc.getGenotype("LJ1A2").isHomVar() && vc.getGenotype("LJA3").isHomVar()' > shared.vcf
This gives the shared variants in LJ1A2 and LJA3. And I can use the below expression to find shared private variants
'vc.getGenotype("LJ1A2").isHomVar() && vc.getGenotype("LJA3").isHomVar() && vc.getGenotype("JLA4").isHomRef()'
this gives the shared hom varaints in LJ1A2, LJA3 and REF in JLA4. i.e. shared private hom variants with respect to one control (JLA4).
How can specify all the other samples (100's) in the vcf file to have '.isHomRef()' except those specified to have '.isHomVar' ?
It would be complex to specify each and every sample manually when there are >200 samples. COuld someone suggest a smoother way of doing this by using any wild cards?
Tried to use the above script using vcffilterjs. but the installation failed with long error message with few shown below:
Then tried with picard, it worked but the output looks wierd which writes all the variants in the input to the output in a different format for the FORMAT columns as shown below:
which does not output any private variants.
Could you suggest further.
The manual says : java 1.8 is required. Did you check that ?
picard adds an item in the FILTER column.
jvarkit: Yes, i did used Java 8:
Picard: Yes, it has
AllGtsFiltered
andAllGtsFiltered;JavaScript
items in FILTER column. Then i picked the variants withAllGtsFiltered
in FILTER column . And it does the trick. Thanks!!!I tried to get heterozygous variants by using
if( g.isHomRef() || g.isNoCall() || g.isHetVar())
which is not valid. and could not find a valid expression in the documentation. Any suggestion?doc about genotype is here: https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/variantcontext/Genotype.html
so, instead of
g.isHetVar()
, tryg.isHet()
Tried
if( g.isHomRef() || g.isNoCall() || g.isHet())
andif( g.isHomRef() || g.isNoCall() || g.isHetNonRef())
. In both cases it worked without any error but all the input variants haveAllGtsFiltered
which does not make sense. The idea is to have all the other samples to be 0/0 or 0/1 or ./.would select ANY genotype but the
g.isHomVar()
i did not understand your previous post. The aim is that the selected samples should have
1/1
and the rest can be0/1
or0/0
or./.
could you briefly explain.