I think you should really try to describe your project to us better, especially try to answer these essential questions first.
What kind of technology was used for genotyping, e.g. illumina sequencing or chips? Which parts of the genome were genotyped, e.g. the whole exome? How was the variant calling done, e.g. samtools?
In which formt do your variant calls come, e.g. VCF format?
The fact that you omit this essential information tells me, that your project is not well-defined at the moment and needs some care. In case this is a thesis type of project I urgently suggest you talk to your supervisor about this.
In this case I think the primary question should not be (and therefore in your best interest I won't answer it right now), whether there is some R package that does some analysis, but whether the data you have are suitable to carry out the analysis you wish to do. We are not in a position to judge this yet, so please try to give more details, so we can help you better.
Let me add that the actual calculus to derive the number of mutations could be trivial. In the optimal case you would have the following data and analsysis steps:
- High coverage full exome sequencing data for each trio
- Aligned the reads against the reference genome
- Heterozygous variant calls in all samples using standard tools e.g samtools or GATK
The variant calls in standard format: a single VCF file and an annotation of the family relation
Then you can perform a variety of tests, e.g. test for deviation from Hardy-Weinberg Equilibrium which could give indication for presence of, among others, mutation.
To discover novel variants, it might a good idea to calibrate all sample against all know variants in dbSNP. How many novel variants can be discovered that way?
Finally, to calculate mutation rates from parent to child generation, it will be sufficient to look at the new variants (deviation form reference genome) called in children and not called in parents. These will give an approximation of mutation rates+sequencing error induced variants. While it is possible, that a mutation occurs in a variable site, the probability is relatively low, compared with the whole genome, so I think you could safely neglect them. On the other hand, if you only have genotyped already know variants and are looking for novel alles introduced at these sites, then your ability to infer a mutation rate is hampered possibly by the lack of evidence, because these events are so rare.
Did you define your problem well enough? Why do you believe it is possible to calculate mutation rates from your data? Note that a variant is not necessarily a mutation.
If the allele is present in the child but neither of the parents if it is not a sequencing error it must be a new mutation, or?
That sounds reasonable, but does it yield a mutation rate, how do you define mutation rate? I understand that you wish to detect all variant alleles not present in the parents (neither homozygous nor heterozygous calls), then count these and average over all family samples, that way giving an average "mutation rate" per generation? That sounds feasible to script but I am not sure if there is an R-package that does exactly this out of the box. If you could give an example of your data-format it should be easy to figure something out.
I am agree with you. It will not yield a mutation rate. Only it will calculate mismatches. And i also have added example in my question.