phasing de novo mutations
2
0
Entering edit mode
10.3 years ago

Hi folks,

Let's suppose I called the variants in a parent child trio and filtered for de novo mutations in the child. I am now interested in the phase, that is, I would like to know whether the mutation originated in the male or female germ line. Under certain circumstances this is possible: I there is a read covering not only the de novo site but also a heterozygous polymorphism that can only have been transmitted from one parent, this information can be used for phasing.

Let's have a look at the following pseudo vcf file:

#chr pos child father mother
chr1 10   0/1   0/0      0/0
chr1 20   0/1   0/1      0/0

The second line can be phased without any further knowledge:

chr1 20   1|0   0/1      0|0

Now, if the first heterozygous mutation is on the same read as the second, then we know also the phase of this variant:

chr1 10   1|0   0|0      0|0
chr1 20   1|0   0/1      0|0

Vice versa, if the first heterozygous mutation and the second one are not on the same read, the de novo mutation arose in the maternal germ line:

chr1 10   0|1   0|0      0|0
chr1 20   1|0   0/1      0|0

Does anyone know about a software tool that does this kind of phasing?

Thanks a lot!

peter

phasing denovo next-gen • 4.5k views
ADD COMMENT
0
Entering edit mode

thx for your information.

Could you please tell me how to find de novo mutation in trio sequencing data?

thx in advance!

ADD REPLY
0
Entering edit mode

Hi,

I use GATK Unified genotyper do generate a multiple vcf file. Then I upload the data to GeneTalk, set the affection status and filter for dominant.

If you need further assistance about GeneTalk, don't hesitate to contact me: peter at gene-talk.de

ADD REPLY
0
Entering edit mode
10.3 years ago
Vivek ★ 2.7k

They doesn't necessarily have to be on the same read, I think linkage equilibrium can be applied to infer haplotype of origin for de novo mutations within 1-5 kb of a mutation that satisfies mendelian inheritance.

The GATK has a tool for this that works on VCF files, ReadBackedPhasing.

ADD COMMENT
0
Entering edit mode

Hi Vivek,

Thanks for your answer! I had a look at the documentation of the ReadBackedPhasing tool from GATK. As far as I understood, all possible 2^n haplotypes are constructed, if we consider n variant positions. Although it didn't become clear to me from the parameters it sounds like these potential haplotypes will then be compared to known haplotype data bases to determine the most likeliest haplotype.

However, if this is how it works, it won't help with any de novo variant, as these variants cannot be in any haplotype data base yet. Thus the haplotype probabilities for the 2^n or 2^(n-1) possiblities should be the same no matter whether I include or exclude the variant position of the de novo mutation.

Please let me know if I misinterpreted you.

Cheers,
peter

ADD REPLY
0
Entering edit mode

I don't see the part about comparing to local databases anywhere in the documentation. As far as I know the tool consider all possible haplotypes in a given locus and picks the haplotype with the highest probability using the read information. So if your denovo mutation falls within a haplotype string with other heterozygous mutations that have been phased, you can assign the haplotype of origin.

Here is a bit more information and if you search their help forums there's a bunch of useful threads with the developer clarifying some of the issues.

ADD REPLY

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6