Entering edit mode
3.6 years ago
shubhamkumbhar420
▴
40
Hey guys, I want to merge VCF files without containing any duplicates which are generated from three different variant callers. The BAM file is the same for all three variant callers.
1.BCFtools
bcftools mpileup -Ou -f hg19.fa.gz NIST7035_BWA_Samtools_sorted_PCR_RG.bam | bcftools call -mv -Ov -o NIST7035_BWA_Samtools_sorted_PCR_RG_bcftools_call.vcf
2.GATK HaplotypeCaller
java -jar -Xmx6G gatk.jar HaplotypeCaller -R /mnt/x/linux/NIST_Garvan/hg19.fa.gz -I /mnt/x/linux/NIST_Garvan/NIST7035_BWA_Samtools_sorted_PCR_RG.bam -O /mnt/x/linux/NIST_Garvan/NIST7035_BWA_Samtools_sorted_PCR_RG_GATK_HaplotypeCaller.vcf
3.Freebayes**
freebayes -f hg19_freebayes.fa NIST7035_BWA_Samtools_sorted_PCR_RG.bam > NIST7035_BWA_Samtools_sorted_PCR_RG_Freebayes.vcf
I am using bcftools merge, but I think I am getting duplicate calls.
A suggestion with the command line will be very helpful.
Thank you!!!!
show us an example please
I am not sure about that but, BCFtools generates 671010 variants, GATK generates 316799 and Freebayes generates 593455 variants. And merged file goes up to 960578 variants.
what is the output of
And some more
what was the command to merge ?
bcftools merge --force-samples file1.vcf file2.vcf file3.vcf >file123.vcf
Hi, what, in your definition, is a duplicate?
POS
?ID
?Please take a look at the
--merge
flag withbcftools merge
. Also, prior to merging these files, I would normalise them by usingbcftools norm -m-any -f ref.fasta
I think there are some duplicates having the same ID. But that's not my main issue here! I am calling variants with 3 different variant callers on the same sample(NA12878 GIAB Garvan data).
BCFtools generates 671010 variants, GATK generates 316799 and Freebayes generates 593455 variants. And merged file goes up to 960578 variants.
Now I have a VCF file from the same project to refer , but it has only 416818 variants. I don't know why is so much difference!!!
In each file from each variant caller, please normalise the variants and set the IDs to be unique. Then do the merge.
Please see what i am doing in Step 4, here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2
I did exactly as you mentioned in step 4 but it couldn't help with the merging issue!!! what should I do now? Please help
What is the "merging issue", exactly? You tabulated some numbers and think that they are incorrect? Please show records from your individual VCFs, and then the merged VCF, that highlight the issue. Thanks.
Sir, I am new to Bioinfo and developing a simple pipeline that includes Variant calling from 3 different variant callers of the same sample(NA12878 GIAB Garvan data).
Now, After merging with bcftools merge I get a VCF file that contains a Header like that
Here you can see it shows three samples of NA12878 because it came from 3 different variant callers. That's why I think I am having a merging issue. Thank you
You didn't answered Kevin's question:
we want to see some variants, not the samples.
Maybe you need
concat
, notmerge
I am sharing screenshots of VCF files generated from 3 variant callers and Last one is merged VCF file. Files are opened using Notepad ++