I am filtering vcf files using bed files using vcftools. I have one bed file per vcf (split by chromosome):
for i in "${chroms[@]}"; do vcftools --gzvcf denisovan/chr"$i"_mq25_mapab100.vcf.gz --bed bed/chr"$i"_mask.bed --recode --keep-INFO-all --stdout | gzip -c > filtered/denisovan.filtered."$i".vcf.gz; done
This is proving extremely slow. Is there any tool that is much quicker for doing this?
note: don't use gzip , but bgzip.
Hi, what is the reasoning for using bgzip over gzip? Thanks.
bgzip allows random access and you can use it with tabix: it's faster to extract a random partion (a genomic interval) of your vcf.
e.g: https://software.broadinstitute.org/software/igv/VCF