How to create a simulated SNP genotypes dataset
1
0
Entering edit mode
9.0 years ago

I am trying to create a simulated SNP dataset, consisting of ~40 individuals from two populations, and data from around 1000 co-dominant SNP markers, exhibiting a predefined (low) level of structure. The result I am looking for is one similar to that given by genalex's create function, but the crucial difference is that I require a dataset comprising 4 SNP alleles overall (A,C,G,T), but only two alleles per SNP marker (A/T, C/G, C/T etc.). All combinations present, basically like you would expect from a typical population genetic SNP dataset.

Essentially, I guess I am after a function that allows me to dictate parameters such as level of structure, no. of individuals, number of alleles overall and per SNP, and that delivers a SNP dataset that roughly adheres to those parameters. An R solution would be preferable as it is the only language I am comfortable with for the moment.

R SNP • 2.8k views
ADD COMMENT
2
Entering edit mode
9.0 years ago

Plink is by far your best bet for this - http://pngu.mgh.harvard.edu/~purcell/plink/simulate.shtml

ADD COMMENT

Login before adding your answer.

Traffic: 2466 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6