Filter multisample vcf for denovo variant
2
2
Entering edit mode
7.1 years ago

Hello,

I have a multisample vcf file (3 samples) and I'd like to get all denovo variants for a specific sample. I tried bcftools:

bcftools view -s SAMPLE_ID -x All-final.vcf

The problem is, that some sites can be multiallelic. So the above command would e.g. not find this line (The third sample is the one I'm interested in):

chr5    38528951    rs762238623 GACAC   GAC,G   1204.93 PASS    .   GT:DP:AD:RO:QR:AO:QA:GL 0/1:10:2,6,0:2:75:6,0:212,0:-15.9529,0,-3.87444,-16.555,-5.68062,-22.6541   0/1:40:10,21,3:10:343:21,3:677,105:-51.4565,0,-21.4531,-45.9266,-19.2432,-72.8601   1/2:39:0,19,10:0:0:19,10:622,279:-72.2747,-22.0435,-16.3239,-50.201,0,-47.1907

Because the requirements are not fullfilled:

-x, --private print sites where only the subset samples carry an non-reference allele. Requires --samples or --samples-file.

So, what's the best way here to find all denovo variants in a given sample?

Thanks.

fin swimmer

vcf bcftools SNP • 2.9k views
ADD COMMENT
2
Entering edit mode
7.1 years ago

try GATK SelectVariants https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_variantutils_SelectVariants.php

Generating a VCF of all the variants that are mendelian violations. The optional argument '-mvq' restricts the selection to sites that have a QUAL score of 50 or more

 java -jar GenomeAnalysisTK.jar \
   -T SelectVariants \
   -R reference.fasta \
   -V input.vcf \
   -ped family.ped \
   -mv -mvq 50 \
   -o violations.vcf

I've also written: http://lindenb.github.io/jvarkit/VCFTrios.html

ADD COMMENT
1
Entering edit mode

Hello Pierre,

checking for mendelian violation is not exactly what I ask, but for more needs this is also very good.

Thanks a lot.

fin swimmer

ADD REPLY
0
Entering edit mode

what's your definition of l 'denovo variants ' without the context of a trio ?

ADD REPLY
1
Entering edit mode

Without the context of a trio my definition for denovo is a non-reference allel that only occur in on specific sample compared to other samples in the multisample vcf.

But within the context of the trio your are absolutly right, that every mendelian violation is at least suspicious.

fin swimmer

ADD REPLY
0
Entering edit mode

denovo is a non-reference allel that only occur in on specific sample compared to other samples in the multisample vcf.

i would say it's a "rare variant" :-)

ADD REPLY
0
Entering edit mode

Ok, if this is the right term :)

Even if my initial problem is solved, I'm still interested in how to filter those rare variants for a given sample within a multisample, multiallelic vcf.

fin swimmer

ADD REPLY
0
Entering edit mode

still with GATJ SelectVariants using the option -select someting like '-select "AC<1" see the GATK doc/ JEXL.

ADD REPLY
1
Entering edit mode
6.0 years ago
Chadi Saad ▴ 110

use genmod to annotate your variants with genetic models:

ADD COMMENT

Login before adding your answer.

Traffic: 1879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6