I am trying to extract columns based on two conditions from the indices of two overlaps. This is an example:
df1 = data.frame(chr=c("chr1", "chr1"), start=c(20,21), stop=c(28,29), value1=c(1,2))
df2 = data.frame(chr=c("chr1", "chr1", "chr1"), start=c(20,22, 28), stop=c(22,24,34), value2=c(3,4, 60))
df3 = data.frame(chr=c("chr1", "chr1"), start=c(3,1), stop=c(8,4))
df4 = data.frame(chr=c("chr1", "chr1", "chr2"), start=c(10,1, 1), stop=c(12,2, 2))
df1_all = cbind.data.frame(df1, df3)
df2_all = cbind.data.frame(df2, df4)
Which looks like this:
> df1_all
chr start stop value1 chr start stop
1 chr1 20 28 1 chr1 3 8
2 chr1 21 29 2 chr1 1 4
> df2_all
chr start stop value2 chr start stop
1 chr1 20 22 3 chr1 10 12
2 chr1 22 24 4 chr1 1 2
3 chr1 28 34 60 chr2 1 2
I would like to get the values from data frame df1_all, together with the matching column from df2_all called "value2", but only for values for which both df1 overlaps df3, and df2 overlaps df4, so in this case it would be:
chr start stop value1 chr start stop value1 value2
chr1 21 29 2 chr1 1 4 2 4
I am almost there but I am still getting something wrong in my real data and I cannot find the bug, I have been trying to find a solution for long now so I am coming here for help and a set of new eyes on this problem. Can you please help?
This is what I have:
df1.gr = makeGRangesFromDataFrame(df1)
df2.gr = makeGRangesFromDataFrame(df2)
df3.gr = makeGRangesFromDataFrame(df3)
df4.gr = makeGRangesFromDataFrame(df4)
# First overlap
hits1 <- findOverlapsdf1.gr, df2.gr, maxgap = 0)
values1 <- rep(FALSE, nrow(df2_all))
values1[unique(subjectHits(hits1))] <- TRUE
OBJ= data.frame(df1_all[unique(queryHits(hits1)),],
matched.df2 = df2_all[unique(queryHits(hits1)),"value2"])
# Second overlap
hits2 <- findOverlapsdf3.gr, df4.gr, maxgap = 0)
values2 <- rep(FALSE, nrow(df2_all))
values2[unique(subjectHits(hits2))] <- TRUE
ov = values1 & values2
OBJ = OBJ[ov,]