Hi. I am working with the phase 3 1000 genomes vcf data (available here: http://www.internationalgenome.org/data) and need to estimate the number of synonymous and non-synonymous sites
For my analysis, if I had details of what type of site is occurring - e.g. non-synonymous change at a 0-fold site, synonymous change at 2-fold site, then I could restrict my analysis to 0-fold and 4-fold sites and just count those sites and the numbers of polymorphisms at them.
However, I do not have complete codon information. The VCF files provide the reference allele and the alternative allele but not the codon within which the allele is located (which I would need to calculate whether a site is 0-fold etc). Is there any way of obtaining this information? I know UCSC has this data, but their set of alleles seem to be incomplete when compared to the data taken directly fromm 1000genomes.
If this is not possible, I would be grateful for any other suggested methods that might work.
Hi, I've just seen that nestled in the annotation is the codon information as you suggested. Many thanks, and apologies for the unnecessary question!