Hi,
I have 3 vcf files and I want to merge these in the following format:
#CHROM POS REF ALT-1 ALT-2 ALT-3
I used the following command for this purpose:
bcftools merge -m all file1.vcf file2.vcf file3.vcf | bcftools query -f '%CHROM\t%POS\t%REF\t%ALT{0}\t%ALT{1}\t%ALT{2}\n' > combined.txt
These are the number of variants in each file:
file1.vcf
= 665,848file2.vcf
= 741,666file3.vcf
= 825,351combined.txt
= 705,445
combined.txt
looks like this:
I read that bcftools merge will write output if there is at least one variant at the particular position. In my file combined.txt
only ALT-1 has entry in the entire file. ALT-2 and ALT-3 columns have '.' throughout.
If it writes output if there is at least one variant at a position, then how come other two columns are empty ? I don't understand how it decides which variants to be in the file.
Also if there are any other ways to do this please suggest me. Hope someone would help me with this.
Thank you
Oh, that does make sense. So how can I get alternate allele of second and third files ?
I think there is a gap in your understanding of the concepts.
The Alternate allele for a biallelic variant locus will be the same across samples. Only in a multi-allelic locus will there be multiple alternate alleles. So if you wish to get the genotype of each sample, get
[\t%TGT]
using bcftools query after you're done merging.Yes, my vcfs are single-sample VCFs. Oh okay, will try with that. Thank you.