Hi,
I have a set of intervals in a data frame and a query interval range. All I want to find the interval ranges that not only overlap with query ranges but also subsequent ranges. For an example, consider the data frame like follows:
df=data.frame(Id=rep("A1",23),start=c(11176,11176,11176,11176,11176,11176,11176,11177,11177,11177,11177,11177,11177,11178,11178,11179,11179,11179,11233,11233,11233,11233,11233),end=11205,11206,11206,11206,11206,11206,11207,11206,11206,11208,11206,11208,11209,11206,11206,11203,11204,11204,11263,11263,11263,11263,11264))
If my query range interval is 11176
and 11205
. Then in the data frame df, I would like find the intervals that overlap my query interval range and also intervals that overlap the overlapping intervals of query range.
Below is my R code but for some reasons it is not giving me the output I desire. I expect the output 11179
and 11204
but some how my code is outputting only the range 11178
and 11206
.
temp_start= 11176
temp_end=11206
for(i in 1:dim(df)[1])
{
final_start=temp_start
final_end=temp_end
if((findInterval(final_end,c(df$start[i],df$end[i]),rightmost.closed = T,left.open = T)==1L) || (findInterval(final_start,c(df$start[i],df$end[i]),rightmost.closed = T,left.open = T)==1L))
{
final_start=df$start[i]
final_end=df$end[i]
print(final_start)
print(final_end)
}
}
The above code take the query_start(11176
) and query_end(11206
) as input. Later I check either the temp_start or temp_end must be be within the ranges of the interval ranges in data frame df
. If it is then this interval range is taken and being checked whether this interval's range start or end must be within the range of next interval range in for loop.
Any guidance would be highly appreciated. thanks in advance.
There is a typo in your first example, you should add "=c(" after "end" in the declaration of df. Moreover this dataframe seems to contain many identical duplicated entries. In any case I would recommend you GRanges for working with genomic ranges.