I have been trying to consider how current methods for studying gene interactions address multiallelic SNPs and am struggling to find explicit published discussion of the issue. I believe the joint distribution for the genotypic data following interaction between two SNPs can be expressed with the table:
SNP2
BB Bb bb
SNP1 AA AABB AABb AAbb
Aa AaBB AaBb Aabb
aa aaBB aaBb aabb
To give a concrete example. Consider two biallelic SNPs rs1200 (A and G variants) and rs801 (C and G variants). The joint distribution for these SNPs is therefore:
rs801
CC CG GG
rs1200 AA AACC AACG AAGG
AG AGCC AGCC AGGG
GG GGCC GGCG GGGG
Assuming we now seek to compare rs1029256 to a triallelic SNP rs1029256 with variants A, C and T. I believe the following joint distribution is required for unphased genotypes:
rs1029256
AA AC AT CC CT TT
rs1200 AA AAAA AAAC AAAT AACC AACT AATT
AG AGAA AGAC AGAT AGCC AGCT AGTT
GG GGAA GGAC GGAT GGCC GGCT GGTT
The large number of possible combinations must quickly increase the complexity of the problem and for many methods I imagine it is not possible to deal with them as for biallelic SNPs. Are these generally dropped from the analysis or re-coded so that all minor SNPs are grouped?
Thanks for any help you can provide.