Merging VCF files from different populations
1
1
Entering edit mode
23 days ago
Armin ▴ 10

Hi Everyone,

I have multiple vcf files from different plant populations, when it comes do doing a STRUCTURE analysis I need to have all the populations in one vcf file. Although the reference genome is the same for all the vcf files generated, when I merge them it leads to deletion of population specific alleles which in return causes the absence of genetic structure within each population.

Is there a standard way to merge them without the cost of losing population specific alleles?

SNPs alleles GBS merging vcf • 313 views
ADD COMMENT
1
Entering edit mode

when I merge them

how did you merge ?

ADD REPLY
2
Entering edit mode
23 days ago
Michael 55k

Have these VCF files all been obtained using the same pipeline? Because if not, you might introduce method bias into your STRUCTURE analysis. If you have the genomic VCF (gVCF) files it may be better to merge them as they have a genotype for each position.

I have used the following code for merging in a snakemake workflow:

 bcftools index -f {input.vcf1}
 bcftools index -f {input.vcf2}
 bcftools merge --threads {threads} -0 -o {output} {input.vcf1} {input.vcf2}

This can be extended to multiple files. The -0 parameter yields the reference allele in every position without genotype for that sample. See https://samtools.github.io/bcftools/bcftools.html

Here is the whole workflow this is from: https://github.com/mdondrup/admixture_workflow Feel free to adapt it to your needs, it is now made for yeast. It uses ADMIXTURE instead of STRUCTURE and a bunch of other tools and generates ADMIXTURE plots with improved choice of colors.

ADD COMMENT

Login before adding your answer.

Traffic: 1376 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6