Question

calculation synonymous and non-synonymous allele counts in a VCF file

0

Entering edit mode

4.2 years ago

storm1907 ▴ 30

Hi, I wouId like to get absolute numbers of functional and synonymous variants in VCF or tab-delimited txt files - only a raw count of how many coding and non-coding variants I have in a sequenced human exome sample or gnomAD v.3.1 genome file. I was looking for similar threads here, but only found solutions for calculating d_N/d_S ratios, or approaches suitable for model organisms.

Is it possible to do it with Illumina Cloud tools?

Thank you!

VCF • 1.8k views

ADD COMMENT • link updated 4.2 years ago by Jorge Amigo 14k • written 4.2 years ago by storm1907 ▴ 30

score 1 · Answer 1 · 2021-04-01

1

Entering edit mode

4.2 years ago

Jorge Amigo 14k

Almost any VCF annotator (SnpEff, ANNOVAR, VEP,...) will give you the genic annotation you need. They're supposed to be intalled and run locally, being SnpEff the one I'd personally recommend for genic annotation, but VEP's web interface may be all that you need.

ADD COMMENT • link 4.2 years ago by Jorge Amigo 14k

0

Entering edit mode

OK, I have already annotated VCFs from VEP. I cannot find such information in there. VEP's web interface outputs only the percentage of variants

ADD REPLY • link 4.2 years ago by storm1907 ▴ 30

0

Entering edit mode

aah, ok, I think SnpEff can do the simple thing I need: https://pcingola.github.io/SnpEff/se_outputsummary/

I have two types of VCFs: raw and annotated with VEP. I guess, I need to use raw files for summarizing?

ADD REPLY • link 4.2 years ago by storm1907 ▴ 30

0

Entering edit mode

Any VCF provided to SnpEff will be added an ANN field on the INFO column with the SnpEff annotation.

ADD REPLY • link 4.2 years ago by Jorge Amigo 14k

0

Entering edit mode

VEP annotation should include the information you need. You may be confused because you are maybe looking for "non-synonimous" term, but you'll never have that since anything that is not synonimous will have a consequence that will be described explicitly, therefore to get the numbers of synonimous vs. non-synonymous you'll have to count synonimous variants in one hand, and all other exonic non-synonimous consequences on the other. Attending to Ensembl's docs, these would be missense variant, inframe insertion, inframe deletion, stop gained, frameshift variant and coding sequence variant.

ADD REPLY • link 4.2 years ago by Jorge Amigo 14k