Hi, I'm studying about the SNP and CNV combine effect in GWAS. Normally, the SNP and CNV are analyzed separately, however, what if there's a SNP on a CNV(Let's call it CNV-SNP)? And what if one chromosome has CNV-SNP, and the other one does not? In the end, we need to encode genetic variants into number so that model can be built, and the CNV-SNP will interfere with that process.
I read some researches about CNV and quantitative trait association, but my question is still left unanswered. One research mention this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6200315/, but does not point out the way to accommodate the effect of CNV-SNP.
If anyone has dealt with this problem before, please help me with this.
Thank you very much.
Usually when we have a SNP we assume an underlying diploidy and code 0 (no variant), 1 - heterozygous, 2 - homozygous. You may to code your "duplicated SNPs" as 3 and in theory regression models used in GWAS should tolerate that, however it is not clear if duplicated copy is functional.
Another thing - you need to work with quite frequent CNVs (at least several times per cohort) to include them in GWAS. Otherwise I don't see any problem in including them into GWAS - you separate deletions from duplications and code them as 0/1/2 for deletions, 0/1/2 for duplications (everything higher than 4 copies can become 2) and simply put into your linear models.
Arrays simply don't allow allele-specificity (unless it is cancer and CNVs are very long, and it is also quite limited), so I'd not worry about allele-specific effects. Maybe inferred using linkage disequilibrium, but meh, complex thing =)