How to simulate 100k samples having 40 million SNPs in a proportion of case:control=30:70?
0
0
Entering edit mode
4.8 years ago
b.ambrozio ▴ 30

Hi there,

I need to perform a stress test in a GWAS tool and the duty demands a dataset (plink format) having 100 thousand samples, having 40 million SNPs in a proportion of case:control=30:70.

I'm performing the command:

plink1.9 --simulate ds1.sim --make-bed --out ds1 --simulate-ncases 30000 --simulate-ncontrols 70000

But after about 3 hours execution, I'm facing Error: File write failure., even though my environment has 256T of disk. (It's a container with a S3 (Object store) mounted through s3fs).

I was wondering if there's a clever way to do that? Perhaps breaking it into small chunks and merging up later somehow? It would prevent me to be waiting hours till a error happens and I loose either time and the sample generated so far.

Thanks in advance!

plink gwas • 872 views
ADD COMMENT
0
Entering edit mode

s3fs has a maximum file size (which depends on the multipart_size parameter). Check that you're not exceeding this.

ADD REPLY
0
Entering edit mode

Hello b.ambrozio!

It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/11373

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Sorry about that. I'm new on bioinformatics, not sure which community should I use...

ADD REPLY

Login before adding your answer.

Traffic: 2363 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6