Why is gnomAD AF vs gnomAD exome AF so different?
1
0
Entering edit mode
4.0 years ago
nub6 ▴ 20

Hi,

I am trying to filter my exome data to find variants with an allele frequency <0.05 in gnomAD but notice that gnomAD AF vs gnomAD exome give opposite results!

e.g. rs73976541 allele C on gnomAD exome is 0.005 however on gnomAD AF is >0.9

Why is this? I have WES data using GRCh38

Thanks!

gnomAD WES next-gen • 4.0k views
ADD COMMENT
0
Entering edit mode

Perhaps you're looking at genome and exome AF!

ADD REPLY
0
Entering edit mode

but shouldn't they be the same?

ADD REPLY
2
Entering edit mode

Note that although it's true that the number of genomes and exomes in gnomAD are not the same, and therefore the allele frequencies may not be exactly identical, they should be at least similar. I agree that 0.005 and 0.02 can be considered as not that similar when dealing with really rare variants, but if you see things like the one you saw (0.005 vs 0.9), be sure that the frequency does not refer to the same allele. And if it does, you should report it to the gnomAD team for them to correct it.

ADD REPLY
1
Entering edit mode

Actually not. Take a look at gnomAD FAQ. The number of samples in Exome and Genome studies is very different!

ADD REPLY
0
Entering edit mode

Oh thanks. That is useful to know :)

ADD REPLY
0
Entering edit mode

The sample count really varies at intergenic regions too! Might get one or two samples from the WES dataset calling SNP at a site and all of the WGS, leaving very unbalanced frequencies.

ADD REPLY
4
Entering edit mode
4.0 years ago

Quick answer: you're dealing with different references, therefore the reference alleles can differ, therefore the frequencies can differ too.

If you check rs73976541 on gnomAD v2.1.1 you're using GRCh37, where the change of the reference C for an alternative T occurs at a global frequency of ~0.005 and ~0.02 respectively in the exomes and genomes analyzed.

If you check rs73976541 on gnomAD v3.1 you're using GRCh38, where the change of the reference T for an alternative C occurs at a global frequency of ~0.9795 in the genomes analyzed.

So changing from GRCh37 to GRCh38 changed the reference allele for rs73976541! Unexpected for the untrained eye, but it can happen as you can see yourself on dbSNP. The take home message would be that the frequency refers to a particular allele, and not to the variant as a whole.

ADD COMMENT

Login before adding your answer.

Traffic: 2128 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6