Optimizing VCF File Merging Using bcftools
3
0
Entering edit mode
3 days ago
o.h3096 • 0

Hello,

I am trying to merge 1200 WGS VCF files (around 9 GB each) using bcftools. I want to merge them by chromosome to make the output files easier to work with.

I am using an HPC cluster with 156 threads and 1to RAM, but I’m not sure how to optimize the resources to make the merging process faster. I used parallel but it didnt work as i get this error.

"Could not load local index file 'path/to/file.tbi' : Too many open files"

Even when I ran bcftools for a single chromosome, I got the same error message.

Any advice on how to resolve this issue and optimize resource usage would be greatly appreciated!

Thank you,

Bcftools merge parallelize WGS • 226 views
ADD COMMENT
1
Entering edit mode
3 days ago
GenoMax 147k

This may be related to ulimit option set for you account. See: https://askubuntu.com/questions/1182021/too-many-open-files

ADD COMMENT
1
Entering edit mode
3 days ago

just so that there is also an answer here try something like as GenoMax points out in the link

ulimit -n 100000 

though the sys administrator may have some limits what you can set the limit to,

in that case merge in batches, merge 100 at the time per process.

now you have 12 files now merge those

ADD COMMENT
1
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thank you all! I adjusted the ulimit, and it worked perfectly.

ADD REPLY

Login before adding your answer.

Traffic: 2085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6