Hi guys,
this has been plaguing me for weeks. For some reason it wasn't a major issue when I first ran it, but now it just won't run. Here is my problem:
I first split them by chromosome to help with parallelism.
- I first had around 70 human WGS to jointgenotype before VQSR. Ran it with GenotypeGVCF with Mem=32G and NT=8
- Then I had 130 human WGS to jointgenotype before VQSR. Ran it with Mem=64G and NT 8. I would get a couple file locking errors and sometimes random *.idx file could not be read. I was still able to run 4 chromosomes through at a time.
- Then I had 101 human WGS to jointgenotype. Could not get it to run without Mem=512G and NT 8. I would only be able to run 2-3 Chr at once. Same error messages.
Now I am trying to run about 125 human WGS for jointgenotyping and I can't even run one job with Mem=512G and NT 8. I tried to scale down as low as 64G/4cores and many combinations in between. I am getting more frequent file locking and index file can't read errors. I thought about using CombineVariants tool but it would take about 4 weeks to combine them all.
All of these were run with GATK3.4. I asked their forum and they told me to "look into Queue" which looks like a lot of work to learn just to get past the jointgenotype step.
Please help! I've been stuck on this for weeks!
Edit: So I used the disable auto file creation and locking argument which resolved file locking errors but now I have "Too many files open" errors.
I literally just found that too! thanks so much. I hope it will work. I can't believe it took me over a month to discover this because I was querying "file lock error" instead of "unable to lock file"
Actually now I am getting a "Too many open files" error. I checked that I have unlimited files I can have open so I don't know why I am getting this error.
Could you share the command line of your GATK jointgenotyping?
So this is just an example and I forgot how to use a list to call variants so I just type out all the variants in the command line. This is for Chr X.