Entering edit mode
6.6 years ago
I wanted to test the speed of a machine learning algorithm I am working on, but do not have a large enough GWAS dataset. I've played around with HAPGEN2 but was only able to simulate enough data from the 1000Genomes project to have 10000 SNPs.
What is the easiest way to generate genotype information for at least 1M SNPs? I'm open to using any publicly available dataset. I just need a very large data matrix to validate the accuracy and speed of my algorithm.