How to calculate reliable Ka/Ks or dN/dS ratio for genes of interest from VCF file
0
1
Entering edit mode
8 months ago

Hi Everyone

I am struggling a bit and when I say I rigorously searched the internet before opening this issue, I actually did. I don't know if there are tools for calculating ka/ks ratio for a gene given a snpEFF annotated VCF file. Currently I am using a script provided by Philipp here but I am skeptical if it is doing the right thing because the ka/ks ratio is huge for the genes that should be low in polymorphism. I am inferring these values based on dN/dS inference described here.

The script take into account the zygosity of the variants i.e.

  • For Heterozygous Variants (record.num_het): Adds 1 to the count because the variant appears once in the genotype.
  • For Homozygous Alternates (record.num_hom_alt): Adds 2 to the count because the variant appears twice in the genotype.

The way I calculate the genic ka/ks ration is as follows.

  1. Take the genomic VCF file (pf7K covering ~20K genomes).
  2. Subset it using gene coordinates (start and end coordinates) and only considering Variants with "PASS,." filter using bcftools
  3. Running ka_ks.py on the resulting gene.vcf

I came across another tool degenotate ( here ) which gives ka/ks values (and then you can take the ratio I believe) but you need to specify the outgroup samples which I have no information about in this dataset.

If you guys can offer any help that would be much appreciated.

dnds kaks VCF • 434 views
ADD COMMENT

Login before adding your answer.

Traffic: 2132 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6