Hi there,
I need to perform a stress test in a GWAS tool and the duty demands a dataset (plink format) having 100 thousand samples, having 40 million SNPs in a proportion of case:control=30:70.
I'm performing the command:
plink1.9 --simulate ds1.sim --make-bed --out ds1 --simulate-ncases 30000 --simulate-ncontrols 70000
But after about 3 hours execution, I'm facing Error: File write failure.
, even though my environment has 256T of disk. (It's a container with a S3 (Object store) mounted through s3fs).
I was wondering if there's a clever way to do that? Perhaps breaking it into small chunks and merging up later somehow? It would prevent me to be waiting hours till a error happens and I loose either time and the sample generated so far.
Thanks in advance!
s3fs has a maximum file size (which depends on the multipart_size parameter). Check that you're not exceeding this.
Hello b.ambrozio!
It appears that your post has been cross-posted to another site: https://bioinformatics.stackexchange.com/questions/11373
This is typically not recommended as it runs the risk of annoying people in both communities.
Sorry about that. I'm new on bioinformatics, not sure which community should I use...