Hi, I wouId like to get absolute numbers of functional and synonymous variants in VCF or tab-delimited txt files - only a raw count of how many coding and non-coding variants I have in a sequenced human exome sample or gnomAD v.3.1 genome file. I was looking for similar threads here, but only found solutions for calculating d_N/d_S ratios, or approaches suitable for model organisms.
Is it possible to do it with Illumina Cloud tools?
Thank you!
OK, I have already annotated VCFs from VEP. I cannot find such information in there. VEP's web interface outputs only the percentage of variants
aah, ok, I think SnpEff can do the simple thing I need: https://pcingola.github.io/SnpEff/se_outputsummary/
I have two types of VCFs: raw and annotated with VEP. I guess, I need to use raw files for summarizing?
Any VCF provided to SnpEff will be added an ANN field on the INFO column with the SnpEff annotation.
VEP annotation should include the information you need. You may be confused because you are maybe looking for "non-synonimous" term, but you'll never have that since anything that is not synonimous will have a consequence that will be described explicitly, therefore to get the numbers of synonimous vs. non-synonymous you'll have to count synonimous variants in one hand, and all other exonic non-synonimous consequences on the other. Attending to Ensembl's docs, these would be missense variant, inframe insertion, inframe deletion, stop gained, frameshift variant and coding sequence variant.