Generate large GWAS dataset for use with distributed machine learning algorithms
1
0
Entering edit mode
6.6 years ago

I wanted to test the speed of a machine learning algorithm I am working on, but do not have a large enough GWAS dataset. I've played around with HAPGEN2 but was only able to simulate enough data from the 1000Genomes project to have 10000 SNPs.

What is the easiest way to generate genotype information for at least 1M SNPs? I'm open to using any publicly available dataset. I just need a very large data matrix to validate the accuracy and speed of my algorithm.

SNP sequence sequencing gene • 1.3k views
ADD COMMENT
1
Entering edit mode
6.6 years ago
tpoterba ▴ 50

msprime might be what you want.

ADD COMMENT

Login before adding your answer.

Traffic: 1290 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6