I'm running Merlin for a set of affected individuals who are second or third cousins. I have whole genome sequencing data for the affected individuals but not for any parents or grand parents or siblings. I managed to run Merlin and produced some outputs but some of the variants with high LOD are only present in one patient. I think the high LOD for such variants might be due to the major alleles (the reference allele) that is present in most of the patients. This is problematic because reference allele should not be considered as a variant. Here are my questions:
- Why Merlin is using the major alleles and giving these high LODs?
- Should we include the reference (major) allele in the MAP file and allele frequency file?
- What formula do you recommend for making an allele frequency file?
I used variants that had gnomAD allele frequency<0.05 and I calculate the major allele frequency using formula (1-gnomAD allele freq).
This is how my allele frequency file looks like which is similar to the format of an allele frequency file generated by Merlin:
M Marker1
A 2 0.0277335
A 4 0.9722665
M Marker2
A 1 0.021063
A 3 0.978937
Here 4 and 3 are major alleles (reference allele).