I have a bunch of VCF files to merge. Bcftools isn't being able to handle everything so I have to merge in batches. I would like to use GNU parallel to do this because I'm working on an Amazon EC2 instance through PuTTy which sometimes crashes, leaving the process unfinished. How could I do this?
Edit: This is what I ended up doing, in case this is useful to anyone down the line:
ls *vcf.gz > vcf.list
parallel --max-args 30 bcftools merge {} -Oz -o batch_merge{#}.vcf.gz :::: vcf.list
This is merging batches of 30 files in parallel. I had almost 900 vcfs to merge so it went pretty quickly.
you might be interested in some of the comments here; https://shicheng-guo.github.io/bioinformatics/1923/02/28/bcftools-merge