Hello,
I am currently writing code to split a whole genome vcf file by chromosome. Right now, I do so with bcfTools to output 22 .vcf.gz files with the flag --target
such that I can avoid the necessity of using --region
with its mandatory index tbi file. However, this process is rather slow and incurs high expenses.
Looking towards alternatives, I am considering adding a more upstream step to my pipeline that creates a tbi file from my initial whole genome vcf which can be used in the splitting stage.
Does this addition make sense? Would this reduce time and costs? And if not, are there any alternatives that I should consider?
Thank you.
This is my current code:
for i in {1..22}; do bcftools view "$(input)" --targets chr$i --output "$(output)"-chr-$i.vcf.gz --output-type z ; done
Something like this?
parallel -j 10 bcftools view "{}" --targets chr{} --output "{}-chr-{}.vcf.gz" -Oz ::: myInput.vcf.gz ::: {1..22}