Merge multiple VCF files together with different numbers of samples
0
1
Entering edit mode
3.5 years ago
hemr3 ▴ 10

I am aware that there are other questions on this topic. However, the answers aren't working for me.

I have 22 VCF files that I want to merge together. These VCF files do not all have the same samples.

AKA, the chr1 file has individuals SA, SB, SC and SD, and chr2 has SB, SC, SD. The chr20 file only has the individual SD.

I've tried:

bcftools merge test_1.vcf.gz test_2.vcf.gz -Oz -o test.vcf.gz

Error: Duplicate sample names (panTro6), use --force-samples to proceed anyway.

I don't want duplicate samples. I have also tried:

bcftools concat test_1.vcf.gz test_2.vcf.gz -Oz -o test.vcf.gz

Different number of samples in test_2.vcf.gz. Perhaps "bcftools merge" is what you are looking for?

I have also tried:

java -jar gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar MergeVcfs -I test_1.vcf.gz -I test_2.vcf.gz -O test.vcf.gz

But this doesn't produce any files at all in the output directory, even though it doesn't produce any overt error messages. My VCFs are not in the correct format for GATK CombineGVCFs to work.

I could use bcftools merge and force the samples, but this would result in around 22 of most of the samples in the same VCF, increasing the file's size - I'd much rather have the different chromosomes for each sample lined up under one header.

There would be gaps for the samples who don't have certain chromosomes, but that would be fine.

Is this possible?

gatk picard bcftools vcf • 962 views
ADD COMMENT

Login before adding your answer.

Traffic: 3002 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6