Hi Everyone
I am struggling a bit and when I say I rigorously searched the internet before opening this issue, I actually did. I don't know if there are tools for calculating ka/ks ratio for a gene given a snpEFF annotated VCF file. Currently I am using a script provided by Philipp here but I am skeptical if it is doing the right thing because the ka/ks ratio is huge for the genes that should be low in polymorphism. I am inferring these values based on dN/dS inference described here.
The script take into account the zygosity of the variants i.e.
- For Heterozygous Variants (record.num_het): Adds 1 to the count because the variant appears once in the genotype.
- For Homozygous Alternates (record.num_hom_alt): Adds 2 to the count because the variant appears twice in the genotype.
The way I calculate the genic ka/ks ration is as follows.
- Take the genomic VCF file (pf7K covering ~20K genomes).
- Subset it using gene coordinates (start and end coordinates) and only considering Variants with "PASS,." filter using bcftools
- Running ka_ks.py on the resulting gene.vcf
I came across another tool degenotate
( here ) which gives ka/ks values (and then you can take the ratio I believe) but you need to specify the outgroup samples which I have no information about in this dataset.
If you guys can offer any help that would be much appreciated.