Merge two vcf's, keep only intersection of REF/ALT alleles
2
0
Entering edit mode
3.8 years ago

Hi all,

I would like to merge two vcf files chr1g.vcf.gz and chr1hk.vcf.gz. I would like the resulting file to have only the intersection of the two files upon merge.

chr1g.vcf.gz excerpt:

CHROM .POS ....ID... REF... ALT ... QUAL... FILTER....... INFO

chr1 ..... 10031 .. NA .. T ....... C .. NA .. AS_VQSR ..AC=0,etc

chr1 ..... 10055 .. NA .. T ....... C .. NA .. AS_VQSR ..AC=0,etc

chr1 ..... 10061... NA... T........ C .. NA .. AS_VQSR ..AC=0,etc

chr1 ..... 10061... NA ... T ...... TAACC.. .. NA .. AS_VQSR ...AC=0,etc

chr1 .....10109 ... NA ... A....... T .. NA .. AS_VQSR ..AC=0,etc

chr1 .... 10109... NA... AACCCT A .. NA .. AS_VQSR .. AC=0,etc

..

chr1hk.vcf.gz excerpt:

CHROM .POS ....ID... REF... ALT ... QUAL... FILTER....... INFO

chr1 ..... 10055 .. NA .. T ....... C .. NA .. AS_VQSR ..AC=0,etc

chr1 ..... 10061... NA... T........ C .. NA .. AS_VQSR ..AC=0,etc

chr1 .....10109 ... NA ... A....... T .. NA .. AS_VQSR ..AC=0,etc

chr1 .... 10109... NA... AACCCT A .. NA .. AS_VQSR .. AC=0,etc

..

Merging goal ex:

CHROM .POS ....ID... REF... ALT ... QUAL... FILTER....... INFO

chr1 ..... 10055 .. NA .. T ....... C .. NA .. AS_VQSR ..AC=0,etc

chr1 ..... 10061... NA... T........ C .. NA .. AS_VQSR ..AC=0,etc

chr1 .....10109 ... NA ... A....... T .. NA .. AS_VQSR ..AC=0,etc

chr1 .... 10109... NA... AACCCT A .. NA .. AS_VQSR .. AC=0,etc

.. .. The code I have been working with is as follows: "bcftools merge --merge none chr1g.vcf.gz chr1hk.vcf.gz > chr1merge.vcf" This code works to merge based off of REF/ALT allele matches, but is the union of the two original files. How can I tweak it to keep only the intersection?

Thank you!

vcf vcftools bcftools intersection merge • 2.1k views
ADD COMMENT
0
Entering edit mode
3.8 years ago

How can I tweak it to keep only the intersection?

process the output of bcftools merge with an invocation of bcftools isec

ADD COMMENT
0
Entering edit mode

Thank you for the reply! I was attempting to avoid using bcftools isec after the merge because it outputs four extremely large data sets- however if this is the only option I will work with it.

ADD REPLY
0
Entering edit mode

Hello, how would someone do this? From what I can tell, isec produces a list of sites but doesn't produce a merged file. Would you need to then use merge -R isec_output.vcf.gz myfile1.vcf.gz myfile2.vcf.gz to get the merged file for only unique sites?

ADD REPLY
0
Entering edit mode
3.8 years ago
Elucidata ▴ 270

One of the common tools used to merge and intersect vcf files based on the REF/ALT alleles is bedtools intersect (also can be used as intersectbed). One can use this tool to find overlapping entries between files or exclusive entries between files by mentioning the corresponding flags of the tool.

You can look up the variety of uses the tool offers here.

In your case, to get the overlapping (intersecting) entries of the two files with the output files containing the entries of the chr1g.vcf.gz file (including the REF/ALT entries) you can use the following command:

bedtools intersect -header -wa -a chr1g.vcf.gz -b chr1hk.vcf.gz > intersect.vcf OR intersectBed -header -wa -a chr1g.vcf.gz -b chr1hk.vcf.gz > intersect.vcf

You can change the -wa flag to -wb flag to check the entries of file chr1hk.vcf.gz

ADD COMMENT

Login before adding your answer.

Traffic: 2014 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6