Entering edit mode
16 months ago
S
•
0
Hello, I have been spending significant time trying to split a large VCF file into its individual chromosome files.
Here is the code I have been using (it works, but in a very inefficeint manner):
for i in {1..22}; do vcftools --chr chr$i --gzvcf "$(input_vcf)" --recode --recode-INFO-all --out "$(output_name-chr-$i); done
In my trials I have considered a few options:
- The implementation of tabix to index the large multi-chromosome vcf and then its subsequent incorporation into this code.
- BCFtools view command
I feel that the answer lies in the realm of BGZIP and Tabix, although I am still stuck. I also am unsure exactly how tabix and bgzipping helps in the processing of high-throughput data, I would greatly appreciate any help.
I appreciate your time and consideration.
Thank you
what is your end goal?
The end goal is to go from a large multi-chromosome vcf file (around 600 GB) and output 22 single chromosome vcf files. Then these will be run through a pipeline that is able to process the separate pieces (the software I am using cannot run through the large file). Thank you