New Column based on another data frame
1
0
Entering edit mode
5.1 years ago
j.lunger18 ▴ 30

Hi, I am trying to apply this problem to a data frame with variants so that I can say within which domain each variant is found.

> ranges
  start end domain_name
1     1   3   beginning
2     4   6     middle1
3     7   8     middle2
4     9  11         end

> positions
   ID position
1   a        0
2   b        1
3   c        2
4   d        3
5   e        4
6   f        5
7   g        6
8   h        7
9   i        8
10  j        9
11  k       10
12  l       11
13  m       12
14  n       13

I want to add a column to "positions", which will tell me which domain (and there could be multiple for a single variant...) each position is found in. Thanks!

r domains genome • 911 views
ADD COMMENT
0
Entering edit mode
5.1 years ago
Brice Sarver ★ 3.8k

Something like this will work. Assumes non-overlapping domains and no special R packages. Also casting to numerics to avoid any character conflicts. ranges must be global.

locate_domain <- function(position) {
 for (i in 1:nrow(ranges)) {
  r <- c(as.numeric(ranges[i, 1]):as.numeric(ranges[i, 2]))
  if (position %in% r) {
    return(ranges[i, 3])
  }
 }
}

positions <- cbind(positions, domain = sapply(as.numeric(positions$position), locate_domain)

This will search for a given position in a range of positions calculated on-the-fly in the data.frame and return the domain, then cbind it to the positions data.frame. Alternatively, you could pre-compute the ranges and store in a list named by the domain and return the name, compute the range on the fly, etc.

ADD COMMENT

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6