Phasing and de novo finding from VCF file
1
0
Entering edit mode
4.4 years ago
Maxime • 0

Hello,

I want to analyse missense de novo variants in context of autism spectrum disorder and I have access to already annotated (SnpEff) vcf files. Each VCF has at least a trio (father, mother, proband) and sometimes an unaffected sibling.

Do I need to phase before looking for de novo? Which tools can I use to do so? HaplotypeCaller for phasing and VariantAnnotator for de novo findings?

Is it also possible to get the SnpEff html report from an already annotated file?

And finally, what can I use to get an "nicer" output from a VCF file, to list every missense de novo, synonymous, ...

Thank you,

Maxime

vcf de novo snpeff • 2.1k views
ADD COMMENT
1
Entering edit mode
4.4 years ago

Do I need to phase before looking for de novo?

No, for a simple detection.

Which tools can I use to do so?

gatk VariantAnnotator with --annotation PossibleDeNovo + pedigree https://gatk.broadinstitute.org/hc/en-us/articles/360036889772-PossibleDeNovo

Is it also possible to get the SnpEff html report from an already annotated file?

No

And finally, what can I use to get an "nicer" output from a VCF file, to list every missense de novo, synonymous, ...

I wrote VCF2table http://lindenb.github.io/jvarkit/VcfToTable.html

ADD COMMENT
0
Entering edit mode

Thank you Pierre,

Would you care to explain where phasing is needed?

And should I do the first step in this workflow? https://gatk.broadinstitute.org/hc/en-us/articles/360035531432-Genotype-Refinement-workflow-for-germline-short-variants

ADD REPLY
0
Entering edit mode

Would you care to explain where phasing is needed?

https://en.wikipedia.org/wiki/Compound_heterozygosity

ADD REPLY
0
Entering edit mode

Hi Pierre, PossibleDeNovo from VariantAnnotator doesn't seem to work.

I have a VCF containing this mutation :

12  2721137 .   C   T   41828.10 
GT:AO:DP:GQ:QA:QR:RO    0/0:0:13:59:0:456:13    0/0:1:3:1:33:81:2   0/1:3:10:54:94:210:7    0/0:0:91:99:0:2777:91

First two are, respectively, father and mother, third is proband and fourth is unaffected sibling.

I've created a PED file with vcftools from the VCF file and edited it to specify family links

11000   11000.fa    0   0   1   1  
11000   11000.mo    0   0   2   1 
11000   11000.p1    11000.fa    11000.mo    0   2 
11000   11000.s1    11000.fa    11000.mo    0   1

As I understand it, this mutation should be annotated as low confidence denovo :

INFO=<ID=hiConfDeNovo,Number=1,Type=String,Description="High confidence possible de novo mutation (GQ >= 20 for all trio members)=[comma-delimited list of child samples]">

INFO=<ID=loConfDeNovo,Number=1,Type=String,Description="Low confidence possible de novo mutation (GQ >= 10 for child, GQ > 0 for parents)=[comma-delimited list of child samples]">

GQ for parents are 56 and 1 (so >0) and proband is 54.

When running, I come across those :

20:37:08.397 INFO PedReader - Reading PED file out.ped with missing fields: []

20:37:08.399 INFO PedReader - Phenotype is other? false

Could this mean that the issue lies in the PED file?

Thank you

ADD REPLY

Login before adding your answer.

Traffic: 2720 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6