Hi all,
I have around 9000+ vcf files that I'm trying to merge using bcftools merge. They are all located in their own folder so essentially I have a folder containing 9000+ separate folders, each containing one vcf.gz file.
I have tried out the following code via this tutorial
bcftools merge ~/path/to/folders/*.vcf.gz -Oz -o Merged.vcf.gz
However bcftools does not seem to recognize my command since I simply get this error:
About: Merge multiple VCF/BCF files from non-overlapping sample sets to create one multi-sample file.
Note that only records from different files can be merged, never from the same file. For
"vertical" merge take a look at "bcftools norm" instead.
Usage: bcftools merge [options] <A.vcf.gz> <B.vcf.gz> [...]
Options:
--force-samples resolve duplicate sample names
--print-header print only the merged header and exit
--use-header <file> use the provided header
-0 --missing-to-ref assume genotypes at missing sites are 0/0
-f, --apply-filters <list> require at least one of the listed FILTER strings (e.g. "PASS,.")
-F, --filter-logic <x|+> remove filters if some input is PASS ("x"), or apply all filters ("+") [+]
-g, --gvcf <-|ref.fa> merge gVCF blocks, INFO/END tag is expected. Implies -i QS:sum,MinDP:min,I16:sum,IDV:max,IMF:max
-i, --info-rules <tag:method,..> rules for merging INFO fields (method is one of sum,avg,min,max,join) or "-" to turn off the default [DP:sum,DP4:sum]
-l, --file-list <file> read file names from the file
-m, --merge <string> allow multiallelic records for <snps|indels|both|all|none|id>, see man page for details [both]
--no-version do not append version and command line to the header
-o, --output <file> write output to a file [standard output]
-O, --output-type <b|u|z|v> 'b' compressed BCF; 'u' uncompressed BCF; 'z' compressed VCF; 'v' uncompressed VCF [v]
-r, --regions <region> restrict to comma-separated list of regions
-R, --regions-file <file> restrict to regions listed in a file
--threads <int> number of extra output compression threads [0]
Any idea on what I'm doing wrong? Thanks!
Thanks Pierre, that does sound like a nice option. I rechecked a subset of files to see if they were indexed and bcftools returned that all files were already indexed. Re-running your method produced the following:
Think it would be worth re-indexing these files?
When I try to force overwrite index with the following code:
I get this error for each file:
I also tried re-compressing the vcf before re-indexing:
Which results in the error for each of the files:
I'm essentially trying to merge separate .vcf.gz files that have had Ensembl's VEP already run on them, with bcftools if that makes a difference.
do you have some spaces in any of your path ?
you could try
Thanks Pierre, that seemed to do the trick! I don't have any spaces in the file names but there are dashes in them.
So it looks like you concatenate the list and pipe that into V and run bcftools index, correct?
Cool piece of code, thank you for all your help!