How to merge vcf files with different variants but same samples?
5
3
Entering edit mode
6.6 years ago

I have vcf files with exactly same meta region as well as same column names for fix and gt region but different variants. I want to merge them into a single file vcf file with same meta and combined fixed and gt region.like this :

file1.vcf

 ##fileformat=VCFv4.1 
 ##FILTER=<ID=PASS,Description="Passed all filters">
 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 #CHROM POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
1   10  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   11  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   12  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   13  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1

file2.vcf

  ##fileformat=VCFv4.1
  ##FILTER=<ID=PASS,Description="Passed all filters">  
 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 #CHROM POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
1   14  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   15  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   16  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   17  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1

merged.vcf

 ##fileformat=VCFv4.1
 ##FILTER=<ID=PASS,Description="Passed all filters">
 ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 #CHROM POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
1   10  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   11  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   12  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   13  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
1   14  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   15  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   16  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   17  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
vcf • 12k views
ADD COMMENT
0
Entering edit mode

What have you tried? Have you checked vcftools/bcftools? Also, please use the formatting bar (especially the code option) to present your post better. I've done it for you this time. Formatting bar

ADD REPLY
8
Entering edit mode
6.6 years ago

Just use bcftools concat. You should additionally get into the habit of normalising your VCF files prior to performing downstream analyses on them. This can be done with bcftools norm -m-any (I have not done that for the purposes of this answer):

bgzip file1.vcf
bgzip file2.vcf

tabix -p file1.vcf.gz
tabix -p file2.vcf.gz
bcftools concat file1.vcf.gz file2.vcf.gz 

##fileformat=VCFv4.1 
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##contig=<ID=1>
##bcftools_concatVersion=1.2+htslib-1.2.1
##bcftools_concatCommand=concat file1.vcf.gz file2.vcf.gz
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
1   10  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   11  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   12  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   13  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
1   14  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   15  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   16  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   17  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
ADD COMMENT
1
Entering edit mode
ADD COMMENT
0
Entering edit mode
6.6 years ago

Since both th vcfs belong to same sample and contain identical headers:

$ cat test1.vcf <(awk '!/#/ {print}' test2.vcf)

##fileformat=VCFv4.1 
##FILTER=<ID=PASS,Description="Passed all filters">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Read Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  S1  S2  S3
1   10  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   11  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   12  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   13  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
1   14  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 0/0 0/1
1   15  .   C   A   .   .   DP=3;CALLER=Samtools    GT  .   .   1/1
1   16  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/0 0/0 0/0
1   17  .   C   A   .   .   DP=3;CALLER=Samtools    GT  0/1 1/1 1/1
ADD COMMENT
1
Entering edit mode

cough sorting cough

ADD REPLY
0
Entering edit mode

Records are already coordinate sorted.

ADD REPLY
0
Entering edit mode

That's not all the data, surely. Better safe than sorry.

ADD REPLY
0
Entering edit mode
5.7 years ago
Renesh ★ 2.2k

Check this link to merge vcf files https://reneshbedre.github.io/blog/mergevcf.html

ADD COMMENT

Login before adding your answer.

Traffic: 1164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6