I am aware that there are other questions on this topic. However, the answers aren't working for me.
I have 22 VCF files that I want to merge together. These VCF files do not all have the same samples.
AKA, the chr1 file has individuals SA, SB, SC and SD, and chr2 has SB, SC, SD. The chr20 file only has the individual SD.
I've tried:
bcftools merge test_1.vcf.gz test_2.vcf.gz -Oz -o test.vcf.gz
Error: Duplicate sample names (panTro6), use --force-samples to proceed anyway.
I don't want duplicate samples. I have also tried:
bcftools concat test_1.vcf.gz test_2.vcf.gz -Oz -o test.vcf.gz
Different number of samples in test_2.vcf.gz. Perhaps "bcftools merge" is what you are looking for?
I have also tried:
java -jar gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar MergeVcfs -I test_1.vcf.gz -I test_2.vcf.gz -O test.vcf.gz
But this doesn't produce any files at all in the output directory, even though it doesn't produce any overt error messages. My VCFs are not in the correct format for GATK CombineGVCFs to work.
I could use bcftools merge and force the samples, but this would result in around 22 of most of the samples in the same VCF, increasing the file's size - I'd much rather have the different chromosomes for each sample lined up under one header.
There would be gaps for the samples who don't have certain chromosomes, but that would be fine.
Is this possible?