GenomicRanges based on indices or more conditions, and add column from match
0
0
Entering edit mode
6.1 years ago
User 7754 ▴ 270

I am trying to extract columns based on two conditions from the indices of two overlaps. This is an example:

df1 = data.frame(chr=c("chr1", "chr1"), start=c(20,21), stop=c(28,29), value1=c(1,2))
df2 = data.frame(chr=c("chr1", "chr1", "chr1"), start=c(20,22, 28), stop=c(22,24,34), value2=c(3,4, 60))

df3 = data.frame(chr=c("chr1", "chr1"), start=c(3,1), stop=c(8,4))
df4 = data.frame(chr=c("chr1", "chr1", "chr2"), start=c(10,1, 1), stop=c(12,2, 2))

df1_all = cbind.data.frame(df1, df3)
df2_all = cbind.data.frame(df2, df4)

Which looks like this:

> df1_all
   chr start stop value1  chr start stop 
1 chr1    20   28      1 chr1     3    8      
2 chr1    21   29      2 chr1     1    4      

> df2_all
   chr start stop value2  chr start stop 
1 chr1    20   22      3 chr1    10   12      
2 chr1    22   24      4 chr1     1    2      
3 chr1    28   34     60 chr2     1    2

I would like to get the values from data frame df1_all, together with the matching column from df2_all called "value2", but only for values for which both df1 overlaps df3, and df2 overlaps df4, so in this case it would be:

 chr start stop value1  chr start stop value1 value2
chr1    21   29      2 chr1     1    4      2      4

I am almost there but I am still getting something wrong in my real data and I cannot find the bug, I have been trying to find a solution for long now so I am coming here for help and a set of new eyes on this problem. Can you please help?

This is what I have:

df1.gr = makeGRangesFromDataFrame(df1)
df2.gr = makeGRangesFromDataFrame(df2)
df3.gr = makeGRangesFromDataFrame(df3)
df4.gr = makeGRangesFromDataFrame(df4)
# First overlap
hits1 <- findOverlapsdf1.gr, df2.gr, maxgap = 0)
values1 <- rep(FALSE, nrow(df2_all))
values1[unique(subjectHits(hits1))] <- TRUE

OBJ= data.frame(df1_all[unique(queryHits(hits1)),],
matched.df2 = df2_all[unique(queryHits(hits1)),"value2"])

# Second overlap
hits2 <- findOverlapsdf3.gr, df4.gr, maxgap = 0)
values2 <- rep(FALSE, nrow(df2_all))
values2[unique(subjectHits(hits2))] <- TRUE

ov = values1 & values2
OBJ = OBJ[ov,]
genomicRanges overlaps R • 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 1924 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6