Found Correspondent Numbers In Integer Intervals (R)
3
0
Entering edit mode
10.7 years ago
viniciushs88 ▴ 50

I would like to get the correspondent numbers between two integers intervals. My input is like that:

start1  end1    start2  end2    
  20     30      25      35
  25     35      20      30    
 100     190    126      226      
 126     226    100      190

In the first and second line, the overlap from first(1) interval (2 first columns) to second(2) interval (2 last columns) was equal to 6 correspondents numbers (25,26,27,28,29 and 30).

My expected output is like that:

 start1  end1    start2  end2    bp_overlapped   
   20    30       25      35          6        
   25    35       20      30          6
  100    190     126     226          65
  126    226     100     190          65

It is a matrix in R.

Thank you

r overlap • 2.4k views
ADD COMMENT
1
Entering edit mode

Please indicate relevance of question to a specific bioinformatics research problem.

ADD REPLY
2
Entering edit mode
10.7 years ago

This has only the most tenuous connection to bioinformatics if I make a number of assumptions about why you're trying to do this. You should really post this on an R forum. Having said that:

m <- matrix(c(20,25,100,126,30,35,190,226,25,20,126,100,35,30,226,190), ncol=4)
overlap <- apply(m, 1, function(x) length(intersect(x[1]:x[2], x[3]:x[4])))
cbind(m, overlap)
ADD COMMENT
0
Entering edit mode
10.7 years ago
zx8754 12k

This should work:

# dummy data
df <- read.table(text="start1  end1    start2  end2    
20     30      25      35
25     35      20      30    
100     190    126      226      
126     226    100      190",header=TRUE)

# Count overlap
df$bp_overlapped <- 
  sapply(1:nrow(df), function(x)
  {
    length(
      intersect(c(df[x,1]:df[x,2]),
                c(df[x,3]:df[x,4])))
  })
ADD COMMENT
0
Entering edit mode
9.8 years ago

You can use findOverlaps command in R. The script is as follows:

data2=read.table("C:/file_name.txt",sep = "\t",fill = TRUE)
data2=data2[data2[,1]=="Chromosome_name",]
end=0
start=data2[,2]

for(i in 1:length(data2[,1]))
{
  x=length(data2[i,])-sum(is.na(data2[i,]))
  end[i]=data2[i,x]
}
chr=data2[,1]
genes=data.frame( chr,start,end)

library(IRanges)
query <- IRanges(start,end)

result=read.table("C:/GC/chromosome_name.txt/result.txt")

subject <- IRanges(c(result$start1), c(result$end1))
tree <- IntervalTree(subject)
findOverlaps(query, tree, select = "all")
ADD COMMENT

Login before adding your answer.

Traffic: 1979 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6