How to create a graph that shows pairwise Fst values at single SNP loci across the all chromosomes in R
1
2
Entering edit mode
20 months ago
Curls ▴ 40

Hi all,

I am trying to obtain a similar graph as shown in the figure with my data. My data includes Fst values for each SNP based the populations (pop1 vs pop2 etc) from admixture analysis. I would appreciate any help. enter image description here

R Fst graph • 1.3k views
ADD COMMENT
2
Entering edit mode
20 months ago
cmdcolin ★ 4.0k

this is not a complete answer, but it may help to empower you to create your own graphs.

i try to keep track of a lot of different visualization related tools (https://cmdcolin.github.io/awesome-genome-visualization/), but indeed, just hand-coding the solution in R is a good approach. I get my mind blown by ggplot2 capabilities

this blogpost was very enlightening for me for creating plots in R with proper e.g. chromosome labels along the bottom, and plotting data according to a 'cumulative' bp position

https://danielroelfs.com/blog/how-i-create-manhattan-plots-using-ggplot/

here is a quick approach at doing something for your problem

first generate 5 random regions (10Mbp long to make them easily visible) with bedtools random on command line, you'll have real data

bedtools random -g hg19.fa.gz.fai -n 5 -l 10000000 | sort -k1,1 -k2,2n > out1.bed
bedtools random -g hg19.fa.gz.fai -n 5 -l 10000000 | sort -k1,1 -k2,2n > out2.bed
bedtools random -g hg19.fa.gz.fai -n 5 -l 10000000 | sort -k1,1 -k2,2n > out3.bed
library(ggplot2)
library(tidyverse)

chrom_sizes <- read.table("hg19.chrom.sizes")
x1 <- read.table("out1.bed")
x2 <- read.table("out2.bed")
x3 <- read.table("out3.bed")
colnames(chrom_sizes) <- c("chr", "size")
colnames(x1) <- c("chr", "start", "end", "name", "score", "strand")
colnames(x2) <- c("chr", "start", "end", "name", "score", "strand")
colnames(x3) <- c("chr", "start", "end", "name", "score", "strand")

# sometimes read.table interprets chromosome names as integers, sometimes
# characters, force to characters
x1$chr <- as.character(x1$chr)
x2$chr <- as.character(x2$chr)
x3$chr <- as.character(x3$chr)


x1$score <- runif(n = nrow(x1), min = 1, max = 100)
x2$score <- runif(n = nrow(x1), min = 1, max = 100)
x3$score <- runif(n = nrow(x1), min = 1, max = 100)


data_cum <- chrom_sizes %>%
  group_by(chr) %>%
  summarise(max_bp = max(size)) %>%
  arrange(-max_bp) %>%
  mutate(bp_add = lag(cumsum(as.numeric(max_bp)), default = 0))

process <- function(df) {
  df %>%
    inner_join(data_cum, by = "chr") %>%
    mutate(bp_start_cum = start + bp_add) %>%
    mutate(bp_end_cum = end + bp_add)
}

data1 <- process(x1)
data2 <- process(x2)
data3 <- process(x3)

data1$id <- "id1"
data2$id <- "id2"
data3$id <- "id3"

df <- rbind(data1, data2, data3)
print(df)

axis_set <- data_cum %>%
  group_by(chr) %>%
  mutate(center = bp_add)



p <- ggplot(data = df, aes(color = score, xmin = bp_start_cum, xmax = bp_end_cum, ymin = 0, ymax = 1)) +
  geom_rect() +
  scale_x_continuous(label = axis_set$chr, breaks = axis_set$center) +
  facet_grid(rows = vars(id)) +
  theme(
    axis.text.x = element_text(angle = 60, size = 8, vjust = 0.5),
    axis.ticks.y = element_blank(),
    axis.text.y = element_blank()
  )


ggsave("out.png", p, width = 11, height = 3)

generates a figure like this

as you can see it's not really done, and not identical to yours, but might be a starting point. updated slightly with facet_grid after posting originally

example todos

  • chromosomes not sorted by name, but by length
  • labels on left instead of right

enter image description here

ADD COMMENT

Login before adding your answer.

Traffic: 2314 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6