Allele Frequencies in Kaviar vs. gnomAD
0
4
Entering edit mode
3.5 years ago
ariel ▴ 250

Below are some plots showing allele frequencies in Kaviar and gnomAD. There is a lot of agreement between the two databases. However, there are odd features.

  1. There are almost no variants for which the allele frequency is higher in Kaviar vs. gnomAD, but many the other way around.
  2. There are many variants for which the allele frequency in gnomAD is roughly 4X that in Kaviar.
  3. There are many variants for which the allele frequency is close to 0 in Kaviar, but range from 0 to 1 in gnomAD.

One possibility for the 4X thing could be zygosity. Perhaps a factor of 2 for each allele?

I couldn't find anything in the documentation that would suggest why the two databases relate this way, in particular the issue of Kaviar variants near 0.

In the plots, the dots represent individual variants (by coordinate), the axes are allele frequencies. In the 2nd plot I randomly select 1e6 points.

NOTE: I would love to provide a MRE, but given the size of the databases it's impossible. I'm using the publically available gnomAD on BigQuery, and downloaded Kaviar and uploaded to BigQuery myself. I was unable to download and work with those files in RStudio or even an AI notebook on GCP. But the inner join (performed on BigQuery) reduces the dataset size enough to download it.

q = inner_join(
  kaviar_bq %>% mutate(AF_KV = AF), 
  gnomad_bq %>% mutate(AF_GN = AF), 
  by=c("chromosome", "position", "reference_allele", "alternate_allele")
) %>% collect()

q %>% 
  ggplot(aes(x=AF_KV, y=AF_GN, color=chromosome)) +
  geom_point() +
  ggtitle("Allele Frequencies: Kaviar vs. gnomAD") +
  xlab("Kaviar") + 
  ylab("gnomAD") +
  geom_abline(intercept = 0, slope = 1) + 
  geom_abline(intercept = 0, slope = 4)

q %>% 
  sample_n(1e6) %>% 
  ggplot(aes(x=AF_KV, y=AF_GN, color=chromosome)) +
  geom_point() +
  ggtitle("Allele Frequencies: Kaviar vs. gnomAD") +
  xlab("Kaviar") + 
  ylab("gnomAD") +
  geom_abline(intercept = 0, slope = 1) + 
  geom_abline(intercept = 0, slope = 4)

q %>%
  ggplot(aes(x=AF_KV, y=AF_GN)) +
  geom_hex(bins=100, aes(fill=..density..)) +
  ggtitle("Allele Frequencies: Kaviar vs. gnomAD") +
  xlab("Kaviar") + 
  ylab("gnomAD") +
  geom_abline(intercept = 0, slope = 1) + 
  geom_abline(intercept = 0, slope = 4) +
  scale_fill_distiller(palette = "Spectral", trans="log10")

enter image description here

enter image description here

enter image description here

gnomAD allele-frequency kaviar • 962 views
ADD COMMENT

Login before adding your answer.

Traffic: 1905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6