Entering edit mode
3.5 years ago
evafinegan
•
0
Hello,
I used bcftools
to merge multiple vcf files:
bcftools merge /path/freebayes_genome_1.vcf.gz /path/freebayes_genome_2.vcf.gz /path/freebayes_genome_3.vcf.gz /path/freebayes_genome_4.vcf.gz /path/freebayes_genome_5.vcf.gz > freebayes_genome.vcf
However, I noticed some inconsistencies with individual vcf file vs merge vcf file. For example,
freebayes_genome_1.vcf.gz
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3 Sample4
CHR7 56667 . C T 2087.31 . AB=0;ABP=0;AC=58;AF=1;AN=58;AO=54;CIGAR=1X;DP=54;DPB=54;DPRA=0;EPP=3.6537;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=29;NUMALT=1;ODDS=10.2805;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=2223;QR=0;RO=0;RPL=26;RPP=3.17115;RPPR=0;RPR=28;RUN=1;SAF=27;SAP=3.0103;SAR=27;SRF=0;SRP=0;SRR=0;TYPE=snp;technology.ILLUMINA=1 GT:DP:AD:RO:QR:AO:QA:GL .:.:.:.:.:.:.:. .:.:.:.:.:.:.:. .:.:.:.:.:.:.:. 1/1:1:0,1:0:0:1:42:-4.19317,-0.30103,0
And merged vcf file freebayes_genome.vcf
shows:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3 Sample4
CHR7 56667 . C T 2420.69 . DPB=34;EPPR=0;GTI=0;MQMR=0;NS=18;NUMALT=1;ODDS=9.81148;PAIREDR=0;PQR=0;PRO=0;QR=0;RO=0;RPPR=0;SRF=0;SRP=0;SRR=0;DP=220;AB=0;ABP=0;AF=0.964286;AO=49;CIGAR=1X;DPRA=0.907407;EPP=15.8176;LEN=1;MEANALT=1;MQM=60;PAIRED=1;PAO=0;PQA=0;QA=1993;RPL=15;RPP=19.0083;RPR=34;RUN=1;SAF=28;SAP=5.18177;SAR=21;TYPE=snp;technology.ILLUMINA=1;AN=234;AC=232 GT:DP:AD:RO:QR:AO:QA:GL ./.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.
I am confused why is there difference between the calls for same SNP and samples. Can you help me to figure out what is wrong here. Thank you!
How do these VCF files have the same Samples?
bcftools merge
is used to combine VCF files that have variant calls for different samples, and from your example here, you're either showing us a subset of the samples from the second file, or are usingbcftools merge
wrong.I showed a subset of the samples from the second file.
That clarifies things, thank you. It's odd that you're losing a called GT. Can you check if a sample that gets a variant called at this location indeed has a variant in the individual file it is from?
Thanks! I visualized in IGV and found that it is a true variant that has been called in
freebayes_genome_1.vcf.gz
.That doesn't really help with debugging the merge process. You have a variant at the location in the merged vcf file. This variant is not in Sample4, which means it has to be in a different sample. Find out which sample has the variant in the merged VCF file, and see if it is called in its individual VCF file as well. If not, you will have a lead on an entry where a WT was flipped to a Mut and a Mut to a WT. If the individual file also has a variant at that location, you will have something to compare Sample4 to - and come up with hypotheses on why Sample4 got a no-call.
I think there is a misunderstanding here. There is a variant in individual vcf file (which I can confirm based on IGV) i.e.,
freebayes_genome_1.vcf.gz
. After merging the files using bcftools, this variant changes to./.:.:.:.:.:.:.:.
in filefreebayes_genome.vcf
I understand that part. The existence of the record in the merged file must mean that there is another sample with a variant at that locus. Can you see if the merged file has any sample that has a variant at that locus?
Yes, there are other samples that have variants at that locus. I noticed the calls are changing for all the
samples
only fromfreebayes_genome_1.vcf.gz
file in the merging step.