Simulate Associated Phenotype From Existing Genotype Data In R
2
2
Entering edit mode
12.1 years ago

Hi,

I am looking for a way to simulate a phenotype associated with several loci starting from an existing genotype data in R. Is it possible to do this simulation directly in R? I am not a statistician and I cannot find a good reference to follow for calculations that can help me simulate an association with several loci and several causal SNPs, explaining a certain percentage of variance and with a certain heritability. If anybody has some idea about how to do this type of calculation directly in R I would appreciate any help.

Thank you in advance. James

simulation gwas • 5.8k views
ADD COMMENT
0
Entering edit mode
12.1 years ago
Josh Herr 5.8k

I'm by no means knowledgable in this area, but one of my co-workers uses a few programs for modeling phenotypes to genetic map data estimated from SNPs.

She uses phenosim and simrare; I think simrare is compatible with R. You might also want to check out the hypred package in R, but I'm not sure if it specifically will meet your needs.

ADD COMMENT
0
Entering edit mode

Thank you Josh. It looks like all of the softwares either require that either the genotype data be generated with the same software or that the genotype data be manipulated as if I was using the same software. I thought there was a simpler way in R with either some commands or a R package. If there is indeed a way to specify commands in R or an R package for this please someone let me know, otherwise it looks like I would need to use a software like the ones Josh suggested.

Thank you

ADD REPLY
0
Entering edit mode
12.1 years ago
Genotepes ▴ 950

Hi

I think GCTA can do what you want. It is devised for genome-wide data but I think it can used with a more restricted data set. The --simu-causal-loci causal.snplist option allows you to choose which SNPs are causal.

You will need to have an idea of the relationship of your effects (additive so equivalent to the +a of the biometrical model) and the total variance - to avoid a set of SNPs explaining more that 100% of the variance. But besides this problem, it looks like the program is easy to handle (and wrapped into a R command, although this means you need to call it many times).

Christian

http://www.complextraitgenomics.com/software/gcta/Simu.html

ADD COMMENT
0
Entering edit mode

Thank you Christian. I had seen this actually, but I would need to modify the input files to be the same as MACH output since I am not using this software for imputation, but GCTA could help find the calculations to just do this in R: I guess GCTA works by generating the effect of causal variants from a standard normal distribution, and residuals are generated from a normal distribution with mean 0 and variance = sd(1/(h^2 - 1)), so maybe this is all it is needed?

ADD REPLY
0
Entering edit mode

So, does this mean you need to generate a model where imputed SNPs are causal? I think you'll need to put a threshold and give a plain genotype

As for model generation, you are right.

ADD REPLY
0
Entering edit mode

Thanks again Christian. Yes that is correct, sorry if it wasn't clear: I am trying to generate a phenotype associated with several imputed causal SNPs

ADD REPLY

Login before adding your answer.

Traffic: 2125 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6