base composition of human genome
1
0
Entering edit mode
4.3 years ago
xiaoleiusc ▴ 140

Hi, All,

I wonder what are the base compositions (percentage of As, Ts, Cs, and Gs) of the human genome? An Internet search shows that the mean GC content of the human genome is around 41%. I wonder if the human genome A-rich or T-rich?

Thanks ahead, Xiao

genome • 912 views
ADD COMMENT
4
Entering edit mode
4.3 years ago

This info is in a lot of sources, but I just did this in R for fun with the main chromosomes.

library("BSgenome.Hsapiens.UCSC.hg38")
library("tidyverse")

hg38 <- BSgenome.Hsapiens.UCSC.hg38
seqnames <- setNames(seqinfo(hg38)@seqnames, seqinfo(hg38)@seqnames)

base_content <- map(seqnames, function(x) {
  freqs <- alphabetFrequency(hg38[[x]]) %>%
    as_tibble(rownames="letter") %>%
    filter(letter %in% c("A", "T", "G", "C", "N"))
  return(freqs)
})

base_content <- base_content %>%
  bind_rows(.id = "seqnames") %>%
  filter(str_detect(seqnames, "^chr[[:alnum:]]+$")) %>%
  group_by(letter) %>%
  summarize(value = sum(value)) %>%
  mutate(freq = value / sum(value))

And the results for one of the strands of human DNA for each main chromosome. The other strand for each chromosome would obviously just be the complement of these counts.

>base_content
# A tibble: 5 x 3
  letter     value   freq
  <chr>      <int>  <dbl>
1 A      867153993 0.281 
2 C      599043897 0.194 
3 G      601515125 0.195 
4 N      150630720 0.0488
5 T      869942666 0.282
ADD COMMENT
0
Entering edit mode

Beautiful R codes! Thanks. Based on your analysis of the human genome ( haploid if I understand right), it seems there is no obvious GC skew or AT skew in the human genome, very interesting.

ADD REPLY
1
Entering edit mode

The human genome is overall more AT rich. However, CpG islands are common in promoters.

ADD REPLY

Login before adding your answer.

Traffic: 1591 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6