Denovo variant detection in trios using only 3 gVCF files
2
1
Entering edit mode
5.9 years ago
sriniparth ▴ 10

Hello I am working on denovo variant detection in trios and am searching for homozygous ref or alt, and hetero ref or alt in child that are not in the father or mother.

So I would call a de novo mutation candidate in the following cases

Child has genotype 0|1, 1|0 or 1|1 and both parents have 0|0 Child has genotype 1|0 or 1|1 and mother 0|0 Child has genotype 0|1 or 1|1 and father 0|0

Are there any other cases which indicate a denovo mutation which I missed so far?

trio denovo mutation • 1.5k views
ADD COMMENT
1
Entering edit mode
5.9 years ago

Are there any other cases which indicate a denovo mutation which I missed so far?

child carries a HOM deletion, he would be './.' while parents are both '0/0' and '0/0' because they both carry a HET deletion because the caller didn't detect the HET state.

ADD COMMENT
0
Entering edit mode
5.9 years ago

I don't think you can rely on the order of the genotype calls to identify which one is the father and the mother, i.e. I'm not even sure 1/0 can be a valid output. I would always compare with both parents and only consider:

Heterozygous de novo variant
Child = 0/1 | Mother = 0/0 | Father = 0/0

De novo homozygous variant
Child = 1/1 | Mother = 0/1 | Father = 0/0
Child = 1/1 | Mother = 0/0 | Father = 0/1

Child = 1/1 | Mother = 0/0 | Father = 0/0
This last one seems really unlikely, but I guess if you see a case like this you should count it too (or take a closer look at the genotype quality).

And by the way, it's possible that you have values other than 0/1; a . means insufficient coverage, and you can also have numbers >1 for alternative alleles. I ran a summary of the first million lines of a gVCF file and here's the summary I have:

 988916 0/0  
   6935 0/1  
     50 0/2  
      8 0/3  
   3833 1/1  
    115 1/2  
      9 1/3  
      8 2/2  
      5 2/3  
      1 3/4  
      1 4/5

Also GATK has a more formal guide on calling de novo variants, you should probably try to use something like that since it incorporates genotype quality: https://software.broadinstitute.org/gatk/documentation/article?id=11074

Step 3: Annotate possible de novo mutations
Tool used: VariantAnnotator
Using the posterior genotype probabilities, possible de novo mutations are tagged. Low confidence de novos have child GQ >= 10 and AC < 4 or AF < 0.1%, whichever is more stringent for the number of samples in the dataset. High confidence de novo sites have all trio sample GQs >= 20 with the same AC/AF criterion.

ADD COMMENT

Login before adding your answer.

Traffic: 5894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6