Hi All,
I am new to SNP analysis. I have 12 genomes (putatively the same strain) that I have compared against a reference using Snippy (calls SNPs using Freebayes). I just did pairwise comparisons with snippy so I have 12 individual VCF files. I want to calculate Tajima's D for all of these genomes so do I need merge these VCFs using vcf-merge? Or some other method?
So far I have tried to tabix the 12 VCF files and attempted to merge them into one VCF (then sorted that VCF, did I need to do this?) which appears to run smoothly. However when I run vcftools --TajimaD
I get the following error:
Parameters as interpreted:
--gzvcf JKH266_merged_sorter.vcf.gz --out TajimaD --TajimaD 100000
Using zlib version: 1.2.11
Warning: Expected at least 2 parts in INFO entry: ID=AB,Number=A,Type=Float,Description="Allele balance at heterozygous sites: a number between 0 and 1 representing the ratio of reads showing the reference allele to all reads, considering only reads from individuals called as heterozygous">
Warning: Expected at least 2 parts in INFO entry: ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
Warning: Expected at least 2 parts in INFO entry: ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
Warning: Expected at least 2 parts in INFO entry: ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
Warning: Expected at least 2 parts in INFO entry: ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
Warning: Expected at least 2 parts in INFO entry: ID=TYPE,Number=A,Type=String,Description="The type of allele, either snp, mnp, ins, del, or complex.">
Warning: Expected at least 2 parts in FORMAT entry: ID=GL,Number=G,Type=Float,Description="Genotype Likelihood, log10-scaled likelihoods of the data given the called genotype for each possible genotype generated from the reference and alternate alleles given the sample ploidy">
Warning: Expected at least 2 parts in INFO entry: ID=SF,Number=.,Type=String,Description="Source File (index to sourceFiles, f when filtered)">
After filtering, kept 12 out of 12 Individuals
Outputting Tajima's D Statistic...
TajimaD: Only using fully diploid sites.
TajimaD: Only using bialleleic sites.
After filtering, kept 1084 out of a possible 1084 Sites
Run Time = 0.00 seconds
It looks like the header is corrupted on the VCF file. Is there anything I can do to fix this? Should I be going about this experiment in a completely different way?
Thanks!