Efficient Way to Split Huge VCF Files by Chromosome | Inquiry
1
0
Entering edit mode
16 months ago
S • 0

Hello, I have been spending significant time trying to split a large VCF file into its individual chromosome files.

Here is the code I have been using (it works, but in a very inefficeint manner):

for i in {1..22}; do vcftools --chr chr$i --gzvcf "$(input_vcf)" --recode --recode-INFO-all --out "$(output_name-chr-$i); done 

In my trials I have considered a few options:

  1. The implementation of tabix to index the large multi-chromosome vcf and then its subsequent incorporation into this code.
  2. BCFtools view command

I feel that the answer lies in the realm of BGZIP and Tabix, although I am still stuck. I also am unsure exactly how tabix and bgzipping helps in the processing of high-throughput data, I would greatly appreciate any help.

I appreciate your time and consideration.

Thank you

htslib bcftools vcftools vcf tabix • 1.3k views
ADD COMMENT
0
Entering edit mode

what is your end goal?

ADD REPLY
0
Entering edit mode

The end goal is to go from a large multi-chromosome vcf file (around 600 GB) and output 22 single chromosome vcf files. Then these will be run through a pipeline that is able to process the separate pieces (the software I am using cannot run through the large file). Thank you

ADD REPLY
0
Entering edit mode
16 months ago
GenoMax 147k

Please use answers in this thread : Easy way to split VCF file by chromosome

ADD COMMENT

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6