Hello,
I have 280 vcf files, each containing about 200 SNPs from a genotyping experiment. I need to merge all these so I can have a final combined vcf where I have all SNPs in all individuals, that is if an individual lacks that SNP, in the combined file it is coded as ./. or .
I am using the following command from vcftools: vcf-merge A.vcf.gz B.vcf.gz C.vcf.gz | bgzip -c > out.vcf.gz
It worked for two files, although it took about 1 hour, now it's been running of 1 day for the whole 280 files. I was wondering if this is the only way of merging a large number of vcf files or if there is any other way to make it more efficient?
Thank you
Thanks for the answer Pierre. Quick question on the matter of reference genome though, let's say we have a collection of VCF files from different times and thus different reference genomes, is there an easy solution with the combinevariants command ro should we lift all the non-compatible ones to a single reference genome and then combine them?