In our trio data, we have identified germline denovo mutations in the child using a bioinformatics pipeline that we have 'assembled'. In general, some of those de novo mutations might have happened post-zygotically leading to embryonic mosaicism.
My question is, from the set of de novo mutations that we have, is there a way to know which ones are mosaic variants? maybe the latter are characterised by distinguishable allele frequencies?
edit:
would this be a good idea to try: run a somatic caller on my germline sample. The overlap between the detected somatic mutations and my previous set of de novo mutations could be considered as 'potential' mosaicism? I suggested this because mosaicism are considered as somatic mutations (i guess?) does this makes sense to try?
Thank you for your answer, much appreciated. . Could we illustrate more on point 2 and 3 please? W.r.t. haplotype phasing, is the following a valid method?
-I can run readbackedphasing to physically phase all my variants.
-For each denovo variant, check if it is phased to the same haplotype as the other germline mutations around it. If yes, then it can be considered de novo, if not then mosaic?
what do you mean by 'Mosaic variants have a distinct signature when phased to neighboring germline variants'
w.r.t Very sensitive sequencing, we are sequencing up to 300x. of course the coverage is not uniform. Would this be considered 'very sensitive sequencing' ?
Whether your answer is yes or no, I would like to know, what additional information can I get from 'very sensitive sequencing' that would help me identify mosaicism? is it the frequency of the specific mutation on each strand ? or ?
I would not use readbackedphasing. If I recall correctly, it does not take advantage of the paired read information. Also, it will not work because it doesn't understand somatic variants.
For the phasing, yes you can try to phase each de novo variant to a nearby heterozygous germline variant, but germline variants are sparse, so it will probably only work for a small fraction of your total de novo variants.
For the "distinct signature", please see this picture.
For "sensitive sequencing", I was thinking of targeted sequencing to ~2,000x or so. For a given alternate allele fraction, a given sequencing depth, and a sequencing error rate, you can calculate the probability that the variant was present in less than 50% of the input. At higher depths or with a more sensitive assay, you can have higher confidence that the variant is present in less than 50% of your input DNA, indicating that the variant is mosaic in the sample.