Merge VCFs with overlapping samples
1
2
Entering edit mode
5.8 years ago
Wan Shi Tong ▴ 70

I have VCFs that have some overlapping samples, is there a tool that can do this...

###VCF1:
CHR POS ID ALT REF QUAL INFO FILTER FORMAT Sample1 Sample2 Sample3  
SNP1...  
SNP2...  
SNP3...  

###VCF2:
CHR POS ID ALT REF QUAL INFO FILTER FORMAT Sample2 Sample3 Sample4  
SNP2...  
SNP3...  
SNP4...  

I want this:

###VCF1+VC2:
CHR POS ID ALT REF QUAL INFO FILTER FORMAT Sample1 Sample2 Sample3 Sample4  
SNP1... (missing for Sample4)  
SNP2...  
SNP3...  
SNP4... (missing for Sample1)  

not this:

###VCF1+VCF2:
CHR POS ID ALT REF QUAL INFO FILTER FORMAT Sample1 Sample2 Sample3 Sample2_2 Sample3_2 Sample 4  
SNP1... (missing for Sample2_2, Sample3_2, and Sample4)  
SNP2...  
SNP3...  
SNP4... (missing for Sample1, Sample2, and Sample3)  

In this example of what I do not want, Sample2 and Sample3 would only have SNP1, SNP2, and SNP3 and Sample2_2 and Sample3_2 would have SNP2, SNP3, SNP4.


Is there a tool that can merge VCFs and keep only one copy of each sample?

VCF SNP • 4.7k views
ADD COMMENT
1
Entering edit mode

On face value, all that you require is bcftools merge. Pay close attention to the -m parameter, too. Missing genotypes will be represented as ./.

ADD REPLY
0
Entering edit mode

merge would want to have unique samples over vcfs, we could use --force-samples but then we get suffixes which OP doesn't want.

ADD REPLY
0
Entering edit mode

Yeah, that is exactly my problem. vcf-merge and bcftools merge do not merge same samples. They create new entries for each repeated sample unfortunately.

ADD REPLY
0
Entering edit mode

Would be easier to split these back into individual VCFs and then run bcftools concat --allow-overlaps --remove-duplicates to concat the same samples into a single VCF, and then merge everything with bcftools merge. This will work, as I have done it before for this type of situation.

ADD REPLY
2
Entering edit mode
5.7 years ago
Getting there ▴ 120

You need bcftools concat, I used the command below and got the result you described.

bcftools concat -a filtered_indels_annotated.vcf.gz filtered_snps_annotated.vcf.gz -Ov -o filtered_BC_merged.vcf

Some useful info here on the -a option: https://samtools.github.io/bcftools/bcftools.html#norm

ADD COMMENT
0
Entering edit mode

For concat to work we need all samples to overlap exactly.

All source files must have the same sample columns appearing in the same order.

ADD REPLY

Login before adding your answer.

Traffic: 2011 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6