How to simulate 100k samples having 40 million SNPs in a proportion of case:control=30:70?

0

Entering edit mode

4.9 years ago

b.ambrozio ▴ 30

Hi there,

I need to perform a stress test in a GWAS tool and the duty demands a dataset (plink format) having 100 thousand samples, having 40 million SNPs in a proportion of case:control=30:70.

I'm performing the command:

plink1.9 --simulate ds1.sim --make-bed --out ds1 --simulate-ncases 30000 --simulate-ncontrols 70000

But after about 3 hours execution, I'm facing Error: File write failure., even though my environment has 256T of disk. (It's a container with a S3 (Object store) mounted through s3fs).

I was wondering if there's a clever way to do that? Perhaps breaking it into small chunks and merging up later somehow? It would prevent me to be waiting hours till a error happens and I loose either time and the sample generated so far.

Thanks in advance!

plink gwas • 890 views

ADD COMMENT • link updated 4.9 years ago by zx8754 12k • written 4.9 years ago by b.ambrozio ▴ 30

0

Entering edit mode

s3fs has a maximum file size (which depends on the multipart_size parameter). Check that you're not exceeding this.

ADD REPLY • link 4.9 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Hello b.ambrozio!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/11373

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 4.9 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Sorry about that. I'm new on bioinformatics, not sure which community should I use...

ADD REPLY • link 4.9 years ago by b.ambrozio ▴ 30

Login before adding your answer.