simulation of genetic data under several given conditions in R
1
0
Entering edit mode
10.5 years ago

I want to generate a 1600 samples (800 cases and 800 controls) and only 2 snps data for interaction analysis.

P(D|AABB)=0.0909, The condition that I know is MAF(minor allele frequency p(a) or p(b)) =0.1 and genetic heritability h^2 = 0.03, prevalence p(D)=0.1. The epistasis model is a multiplicative model as in table(odds table) below , with alpha=0.09989 and theta=3.4481.

model1  BB        Bb            bb
AA      a(alpha)  a             a
Aa      a         a(1+theta)    a(1+theta)^2
aa      a         a(1+theta)^2  a(1+theta)^4

penetrance for each genotype will be as follow:

P(D|AABb)=0.0909,
P(D|AAbb)=0.0909,
P(D|AaBB)=0.0909,
P(D|AaBb)=0.30747,
P(D|Aabb)=0.6645,
P(D|aaBB)=0.0909,
P(D|aaBb)=0.6645,
P(D|aabb)=0.97491

Does anyone have any ideas?

Maybe it is a little difficult understand...

SNP R gene • 2.8k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
What are you trying to simulate?
ADD REPLY
3
Entering edit mode
10.5 years ago
David W 4.9k

It's hard to know exactly what you are trying to do from this question.

Are you trying to simulate genotypes for cases and controls given what you know about the genetic basis of some disease?

If that's the case you don't need to worry about heritability of the epistasis model, because you have the result of all of that in the penetrances. To turn the pentrance P(D | genotype_i) into a probability of having a genotype given disease-state p(genotype_i | D) you can use Bayes rule.

P(genotype_i | D) = [ P(D | genotype_i)*P(genotype_i) ] / P(D)

The only term you don't already have explicitly here is P(genotype_i) but if you can assume HWE and independent assortment you can get that from the allele frequencies. In R, and matching the order you listed your penetrance data:

hwe <- c(0.9^2, 2*0.9*0.1, 0.1^2)
p_genotypes <- rep(hwe, each=3) * rep(hwe,3)
#should sum to one
all.equal(sum(p_genotypes),1)

From there you can get the probability of each genotype in a case or control (assuming you have your pentrance data in a vector called pen):

genotypes_D <- (pen * p_genotypes)/0.1

And use those to sample genotypes form the multinomial

rowSums(rmultinom(n=800, size=1, prob=genotypes_D))
ADD COMMENT
0
Entering edit mode

Really thank you for your help ~!

ADD REPLY

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6