bcftools error merging two VCFs: REF prefixes differ
0
0
Entering edit mode
14 months ago
Shane ▴ 20

Hi all,

i am trying to merge two VCF files using bcftools merge. However, my command bcftools merge -m id VCF_d.vcf.gz VCF_p.vcf.gz -o merged.vcf.gz --force-samples returns the following

The REF prefixes differ: TG vs GA (2,2) Failed to merge alleles at 18:786377 in VCF_d.vcf.gz

These are the entries in the above mentioned files.

VCF_d 18 786377 AX-120223469 TG T . . AC=0;AN=4144

VCF_p 18 786377 AX-120223469 GA T . . AN=0;AC=0

the header info for VCF_d is

fileformat=VCFv4.1 
FILTER=<ID=PASS,Description="All filters passed" file
Date=20200421 
source=apt-format-result:1.20.0(2.8.0) 
reference=NCBI37 
phasing=none 
FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype" calls-file=./AxiomGT1.calls.txt annotation-file=/home/ubuntu/projects/data_release/Axiom_PMRA.na35.annot.db/Axiom_PMRA.na35.annot.db notes="Reporting for indels: For insertions, the reported chromosome position will be the location of first base that was inserted. For deletions, it will be the location of the preceding base even for a deletion within a nucleotide repeat (e.g., TTT[-/T]TTTTT)." 
contig=<ID=1,length=249250621> 
contig=<ID=2,length=243199373> 
contig=<ID=3,length=198022430> 
contig=<ID=4,length=191154276> 
contig=<ID=5,length=180915260> 
contig=<ID=6,length=171115067> 
contig=<ID=7,length=159138663> 
contig=<ID=8,length=146364022> 
contig=<ID=9,length=141213431> 
contig=<ID=10,length=135534747> 
contig=<ID=11,length=135006516> 
contig=<ID=12,length=133851895> 
contig=<ID=13,length=115169878> 
contig=<ID=14,length=107349540> 
contig=<ID=15,length=102531392> 
contig=<ID=16,length=90354753> 
contig=<ID=17,length=81195210> 
contig=<ID=18,length=78077248> 
contig=<ID=19,length=59128983> 
contig=<ID=20,length=63025520> 
contig=<ID=21,length=48129895> 
contig=<ID=22,length=51304566> 
contig=<ID=MT,length=16569> 
contig=<ID=X,length=155270560> 
contig=<ID=Y,length=59373566> 
export-vcf-file=VCF_ALL.vcf INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes"> INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes"> bcftools_viewVersion=1.6+htslib-1.6

VCF_p is

fileformat=VCFv4.1  
FILTER=<ID=PASS,Description="All filters passed"> fileDate=20180409 source=apt-format-result:1.20.0(2.8.0) phasing=none 
FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> batch-folder=L:\Genotyping\other\AxiomOutputLogs\GT_17_003 annotation-file=C:\Users\Public\Documents\Library\Axiom_PMRA.r2\Axiom_PMRA.na35.annot.db notes="Reporting for indels: For insertions, the reported chromosome position will be the location of first base that was inserted. For deletions, it will be the location of the preceding base even for a deletion within a nucleotide repeat (e.g., TTT[-/T]TTTTT)." export-vcf-file=L:\Genotyping\GenetitanResults\VCFs\rawVCFs\GT_17_003_VCF_ALL.vcf 
contig=<ID=1,length=249250621> 
contig=<ID=2,length=243199373> 
contig=<ID=3,length=198022430> 
contig=<ID=4,length=191154276> 
contig=<ID=5,length=180915260> 
contig=<ID=6,length=171115067> 
contig=<ID=7,length=159138663> 
contig=<ID=8,length=146364022> 
contig=<ID=9,length=141213431> 
contig=<ID=10,length=135534747> 
contig=<ID=11,length=135006516> 
contig=<ID=12,length=133851895> 
contig=<ID=13,length=115169878> 
contig=<ID=14,length=107349540> 
contig=<ID=15,length=102531392> 
contig=<ID=16,length=90354753> 
contig=<ID=17,length=81195210> 
contig=<ID=18,length=78077248> 
contig=<ID=19,length=59128983> 
contig=<ID=20,length=63025520> 
contig=<ID=21,length=48129895> 
contig=<ID=22,length=51304566> 
contig=<ID=MT,length=16569> 
contig=<ID=X,length=155270560> 
contig=<ID=Y,length=59373566> 
liftOverProgram=CrossMap(https://sourceforge.net/projects/crossmap/) liftOverFile=/home/ubuntu/projects/uploads/genotyping/scripts/chains/GRCh37_to_GRCh38.chain.gz new_reference_genome=/home/ubuntu/projects/uploads/genotyping/scripts/ref/Homo_sapiens.GRCh38.dna.toplevel.fa 
liftOverTime=December10,2018

The contig info length is the same in both VCF files. Both VCF files were sent to me from a company. Surely they would not use different reference genomes as the VCF contain SNPs from one cohort so will need to merge together before analysing.

I would appreciate if anyone could help me out!

bcftools VCF • 1.2k views
ADD COMMENT
0
Entering edit mode

I can't find information with respect to the reference genome used to call the variants in VCF_d, but in VCF_p it seems they used GRCh37 as reference, and then they liftover the variant positions into GRCh38 using Crossmap liftover tool. Can you check if the liftover process was also done for VCF_d, or if directly the reference genome used was GRCh38?

ADD REPLY
0
Entering edit mode

I don't think the liftover process was done for VCF_d but it does say reference=NCBI37. On the NCBI link here (https://www.ncbi.nlm.nih.gov/snp/?term=rs201777329) the particular SNP causing me the issue seems to be at two different positions 18:786377 (GRCh38) and 18:786378 (GRCh37).

ADD REPLY
1
Entering edit mode

Then it looks like one VCF is based on GRCh37 coordinates and the other one on GRCh38. In order to merge, you need both VCFs to be based on the same reference genome. I guess, since a company did this, the best is to contact them, so that they can liftover the variants to GRCh38.

ADD REPLY
0
Entering edit mode

Thanks for your help. I will do that. Just to update, I downloaded the reference fasta used by the company. I then used CrossMap and lifted it over to GRCh38.

It seems to have merged successfully.

ADD REPLY
0
Entering edit mode

Edit your post and make the following changes:

  1. Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. You used the double quote option, which is for quoting plain text from a source, not for formatting code as it ends up messing with code elements. Paste the VCF headers again and format them properly as it is not fixable in its current form.
  2. Remove the merge tag - bcftools and VCF are subject matter relevant tag, merge is not a tag, it's merely an operation you want to perform.
ADD REPLY

Login before adding your answer.

Traffic: 1918 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6