I'm looking for a way to simulate phenotypes against a real SNP data source, such as the 1000 Genomes. It must be free for commercial purpose (Eg.: MIT license). Any recommendation? I'm trying to use the GCTA64, but I couldn't get it working. The documentation doesn't help much as it doesn't have practical examples/tutorials. At the end of the day, I want: 1 - Simulate Case/Control and/or quantitative phenotype 2 - Link it with a real SNP dataset (eg: 1000 Genomes) 3 - Conduct GWAS analysis using Plink and/or Hail.
Looking at Hail's and Plink's tutorials for GWAS, I realised both use simulated phenotypes from real SNP data sources (1000 Genomes and HapMap, respectively), but how they created the datasets are beyond of the scope of the tutorials, thus not reported.
As mentioned before, I've tried gcta64
, but no success. Here's what I've tried:
1 - Downloaded 1000 Genome sample from Plink page: Entire dataset as a single .tar.gz (1.12 GB) (A2 allele major, not ref, on chr3 before 15 Oct 2017)
2 - Tried to generate the simulate data by:
./gcta64 --bfile 1kGenomesP1/1kg_phase1_all/1kg_phase1_all --simu-qt --simu-causal-loci causal.snplist --simu-hsq 0.5 --simu-rep 3 --keep test
.indi.list --out 1kg_phase1_all
Error: Error: --keep test.indi.list not found
.
What should be the files:causal.snplist
, test.indi.list
? Any practical example or tutorial?
Btw - I apologise in advance if this is a too trivial question. I'm quite new at it. Appreciate your patience and help :)
Have you got a chance to figure out the steps? I am doing the same thing