Question

Efficient Way to Split Huge VCF Files by Chromosome | Inquiry

0

Entering edit mode

16 months ago

S • 0

Hello, I have been spending significant time trying to split a large VCF file into its individual chromosome files.

Here is the code I have been using (it works, but in a very inefficeint manner):

for i in {1..22}; do vcftools --chr chr$i --gzvcf "$(input_vcf)" --recode --recode-INFO-all --out "$(output_name-chr-$i); done

In my trials I have considered a few options:

The implementation of tabix to index the large multi-chromosome vcf and then its subsequent incorporation into this code.
BCFtools view command

I feel that the answer lies in the realm of BGZIP and Tabix, although I am still stuck. I also am unsure exactly how tabix and bgzipping helps in the processing of high-throughput data, I would greatly appreciate any help.

I appreciate your time and consideration.

Thank you

htslib bcftools vcftools vcf tabix • 1.3k views

ADD COMMENT • link 16 months ago by S • 0

0

Entering edit mode

what is your end goal?

ADD REPLY • link 16 months ago by Jeremy Leipzig 22k

0

Entering edit mode

The end goal is to go from a large multi-chromosome vcf file (around 600 GB) and output 22 single chromosome vcf files. Then these will be run through a pipeline that is able to process the separate pieces (the software I am using cannot run through the large file). Thank you

ADD REPLY • link 16 months ago by S • 0

score 0 · Answer 1 · 2023-07-16

0

Entering edit mode

16 months ago

GenoMax 147k

Please use answers in this thread : Easy way to split VCF file by chromosome

ADD COMMENT • link 16 months ago by GenoMax 147k