Hi, I noticed that some gnomad SNPs have allele frequency =0 and some are just not found. "No genes were found in this region". What is the difference between them?
For example:Y-10010520-10010520 is all frq=0 means that it is very rare and therefore zero? Then what about AF=NA?
Also, does the gnomad genome data contain information about the exomes also, or is the information in the exome data not included in the genome data in gnomad?
Hello, Someone will correct me if I am wrong, but from what I understood, variants with AF=NA were simply not found in Gnomad.
On the other hand variants with frequency were likely found among gnomad samples, but deemed as poor quality and therefore removed.
Regarding the genome/exome, it depends on the version of gnomad browser you are using. Gnomad v3 only provides information from hg38 aligned whole genome data, while v2.1 provides information from hg19 aligned whole exomes and a few genomes.
@ raphael.B thank you :) so if I understand you correctly: I'm using the v2.1 there for the exome data would not be included in the genome data? and maybe you know why some times the allele frequency is different between exome and genome? is it because the number of samples is different?
I am not sure of what you mean by 'include'. Do you mean AF is computed from aggregated genomes and exomes?
Gnomad provides 2 AF when a variant was found in genomes and exomes. Just click on the variant to get this information (at the top of the page).
AF will indeed differ between genomes and exomes since, in gnomad v.2.1, there is 125,748 exomes and 15,708 genomes. Not the same set of samples so not the same allelic frequencies.
Cross-posted on bioinfo SE: https://bioinformatics.stackexchange.com/questions/20181/allele-frequency-gnomad-data where it's received an accepted answer.
Bad form, OP. Very bad form.