Inconsistency in SNP calls using bcftools merge for multiple vcf files
0
0
Entering edit mode
3.5 years ago
evafinegan • 0

Hello,

I used bcftools to merge multiple vcf files:

bcftools merge /path/freebayes_genome_1.vcf.gz /path/freebayes_genome_2.vcf.gz /path/freebayes_genome_3.vcf.gz /path/freebayes_genome_4.vcf.gz /path/freebayes_genome_5.vcf.gz  > freebayes_genome.vcf

However, I noticed some inconsistencies with individual vcf file vs merge vcf file. For example,

freebayes_genome_1.vcf.gz

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Sample1 Sample2 Sample3 Sample4
CHR7    56667   .   C   T   2087.31 .   AB=0;ABP=0;AC=58;AF=1;AN=58;AO=54;CIGAR=1X;DP=54;DPB=54;DPRA=0;EPP=3.6537;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=29;NUMALT=1;ODDS=10.2805;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=2223;QR=0;RO=0;RPL=26;RPP=3.17115;RPPR=0;RPR=28;RUN=1;SAF=27;SAP=3.0103;SAR=27;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.ILLUMINA=1  GT:DP:AD:RO:QR:AO:QA:GL .:.:.:.:.:.:.:. .:.:.:.:.:.:.:. .:.:.:.:.:.:.:. 1/1:1:0,1:0:0:1:42:-4.19317,-0.30103,0

And merged vcf file freebayes_genome.vcf shows:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  Sample1 Sample2 Sample3 Sample4
CHR7    56667   .   C   T   2420.69 .   DPB=34;EPPR=0;GTI=0;MQMR=0;NS=18;NUMALT=1;ODDS=9.81148;PAIREDR=0;PQR=0;PRO=0;QR=0;RO=0;RPPR=0;SRF=0;SRP=0;SRR=0;DP=220;AB=0;ABP=0;AF=0.964286;AO=49;CIGAR=1X;DPRA=0.907407;EPP=15.8176;LEN=1;MEANALT=1;MQM=60;PAIRED=1;PAO=0;PQA=0;QA=1993;RPL=15;RPP=19.0083;RPR=34;RUN=1;SAF=28;SAP=5.18177;SAR=21;TYPE=snp;technology.ILLUMINA=1;AN=234;AC=232   GT:DP:AD:RO:QR:AO:QA:GL ./.:.:.:.:.:.:.:.   ./.:.:.:.:.:.:.:.   ./.:.:.:.:.:.:.:.   ./.:.:.:.:.:.:.:.

I am confused why is there difference between the calls for same SNP and samples. Can you help me to figure out what is wrong here. Thank you!

bcftools vcf snp • 1.8k views
ADD COMMENT
0
Entering edit mode

How do these VCF files have the same Samples? bcftools merge is used to combine VCF files that have variant calls for different samples, and from your example here, you're either showing us a subset of the samples from the second file, or are using bcftools merge wrong.

ADD REPLY
0
Entering edit mode

I showed a subset of the samples from the second file.

ADD REPLY
0
Entering edit mode

That clarifies things, thank you. It's odd that you're losing a called GT. Can you check if a sample that gets a variant called at this location indeed has a variant in the individual file it is from?

ADD REPLY
0
Entering edit mode

Thanks! I visualized in IGV and found that it is a true variant that has been called in freebayes_genome_1.vcf.gz.

ADD REPLY
0
Entering edit mode

That doesn't really help with debugging the merge process. You have a variant at the location in the merged vcf file. This variant is not in Sample4, which means it has to be in a different sample. Find out which sample has the variant in the merged VCF file, and see if it is called in its individual VCF file as well. If not, you will have a lead on an entry where a WT was flipped to a Mut and a Mut to a WT. If the individual file also has a variant at that location, you will have something to compare Sample4 to - and come up with hypotheses on why Sample4 got a no-call.

ADD REPLY
0
Entering edit mode

I think there is a misunderstanding here. There is a variant in individual vcf file (which I can confirm based on IGV) i.e., freebayes_genome_1.vcf.gz. After merging the files using bcftools, this variant changes to ./.:.:.:.:.:.:.:. in file freebayes_genome.vcf

ADD REPLY
0
Entering edit mode

I understand that part. The existence of the record in the merged file must mean that there is another sample with a variant at that locus. Can you see if the merged file has any sample that has a variant at that locus?

ADD REPLY
0
Entering edit mode

Yes, there are other samples that have variants at that locus. I noticed the calls are changing for all the samples only from freebayes_genome_1.vcf.gz file in the merging step.

ADD REPLY

Login before adding your answer.

Traffic: 2520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6