Entering edit mode
7.8 years ago
Tori
▴
90
I have methylation data which looks like this for example.
chr start end meth
chr7 31441 31441 0.16542433
chr7 31467 31467 0.93508003
chr7 50060 50060 0.38091076
chr7 50097 50097 0.31270269
chr7 50147 50147 0.39961491
chr7 50158 50158 0.01449239
chr7 50164 50164 0.76305873
chr7 50355 50355 0.56224390
chr7 75862 75862 0.04551076
chr7 79874 79874 0.57058403
I would like to create 1000bp window and take median of meth
. It looks like following
window chr median start stop
0 chr7 NA 0 10000
1 chr7 NA 10000 20000
2 chr7 NA 20000 30000
3 chr7 0.5502522 30000 40000
4 chr7 NA 40000 50000
5 chr7 0.3902628 50000 60000
6 chr7 NA 60000 70000
7 chr7 0.3080474 70000 80000
I was able to create such table in R, but kind of incomplete. R Code:
df <- data.frame(matrix(ncol = 4, nrow = 10))
colnames(df) <- c("chr","start","end","meth")
df$chr <- c(rep("chr7",10))
df$start <- c(31441,31467,50060,50097,50147,50158,50164,50355,75862,79874)
df$end <- df$start
df$meth <- runif(10, min = 0, max = 1)
bin <- 10000
df2 <- df %>%
mutate(window = .$start %/% bin) %>%
group_by(window,chr) %>%
summarise(median = median(meth)) %>%
mutate(start = window*bin, stop=(window+1)*bin)
Output:
Source: local data frame [3 x 5]
Groups: window [3]
window chr median start stop
<dbl> <chr> <dbl> <dbl> <dbl>
1 3 chr7 0.5502522 30000 40000
2 5 chr7 0.3902628 50000 60000
3 7 chr7 0.3080474 70000 80000
Thank you very much!
What if
df
has not only chr7 in first column?Not the most elegant, but see the updated solution.