Hello,
I have a multisample vcf file (3 samples) and I'd like to get all denovo variants for a specific sample. I tried bcftools:
bcftools view -s SAMPLE_ID -x All-final.vcf
The problem is, that some sites can be multiallelic. So the above command would e.g. not find this line (The third sample is the one I'm interested in):
chr5 38528951 rs762238623 GACAC GAC,G 1204.93 PASS . GT:DP:AD:RO:QR:AO:QA:GL 0/1:10:2,6,0:2:75:6,0:212,0:-15.9529,0,-3.87444,-16.555,-5.68062,-22.6541 0/1:40:10,21,3:10:343:21,3:677,105:-51.4565,0,-21.4531,-45.9266,-19.2432,-72.8601 1/2:39:0,19,10:0:0:19,10:622,279:-72.2747,-22.0435,-16.3239,-50.201,0,-47.1907
Because the requirements are not fullfilled:
-x, --private print sites where only the subset samples carry an non-reference allele. Requires --samples or --samples-file.
So, what's the best way here to find all denovo variants in a given sample?
Thanks.
fin swimmer
Hello Pierre,
checking for mendelian violation is not exactly what I ask, but for more needs this is also very good.
Thanks a lot.
fin swimmer
what's your definition of l 'denovo variants ' without the context of a trio ?
Without the context of a trio my definition for denovo is a non-reference allel that only occur in on specific sample compared to other samples in the multisample vcf.
But within the context of the trio your are absolutly right, that every mendelian violation is at least suspicious.
fin swimmer
i would say it's a "rare variant" :-)
Ok, if this is the right term :)
Even if my initial problem is solved, I'm still interested in how to filter those rare variants for a given sample within a multisample, multiallelic vcf.
fin swimmer
still with GATJ SelectVariants using the option
-select
someting like '-select "AC<1" see the GATK doc/ JEXL.