We'd like to perform trio analysis of mice to detect de novo mutations (via WGS and variant calling). The complication: our mouse facility performs breeding in bulk (~15 each males and females). Mom-pup pairs are obviously straightforward. To identify dads, our strategy is to sequence all of the males and infer paternal parentage by comparing dad-pup SNPs. One method is to manually identify SNPs at heterozygous loci for discrimination, but I assume there are tools that can perform this analysis from bulk VCFs. Any suggestions?
I recently came across Peddy that I'll be trying out soon. VCF tools also has a couple of implementations in --relatedness and --relatedness2, which are based on Yang et al, and Manichaikul et al respectively. There's also KING which I think was used in ExAC.
Hi Andrew, I'm trying to figure out what panel of SNPs KING interrogates to calculate kinship, but this doesn't seem to be explicitly stated anywhere. Do you know what this panel is?
The framework underlying the KING approach to relationship inference
centres on modelling genetic distance between a pair of individuals as
a function of their allele frequencies and kinship coefficient
I don't believe that a panel is used, rather a random selection of SNPs are used to infer the coefficient.
There are several options: I list you a couple that just came to my mind.
The most rigorous one would be to use Mendel. Method 9 is pedigree selection. Basically you specify all the possible pedigrees and it tells you which one is the most likely. Cons of this approach: 1) I am not sure it is easy to specify you data structure (one trio plus 14 unrelated males) to the software. You might have to do several comparisons between trios where mom-pup are the same and dad is changing. 2) I do not think mendel takes vcf as input.
Another approach would be to use the number of Mendelian inconsistencies to find the most probable father, using for example the mendelian plugin (I never tried it. For sure GATK has something similar). Basically, you run it on all the possible trios, i.e. for each mum-pup couple you rotate all the possible dads. If the dads are not related to the mom, then one of them (the real one) should show a sensibly lower number of mendelian inconsistencies.
Be advised that either analysis you perform, you should perform it on a very reliable set of SNPs.
Thanks, Andrew. I'm using VCFtools for other metrics, so I'll probably test it first. Will report back with results.
Hi Andrew, I'm trying to figure out what panel of SNPs KING interrogates to calculate kinship, but this doesn't seem to be explicitly stated anywhere. Do you know what this panel is?
From the paper:
I don't believe that a panel is used, rather a random selection of SNPs are used to infer the coefficient.
Thanks, Andrew. Yes, actually it looks like it uses all of them. (See https://www.biostars.org/p/313503/#313672)