Partial or complete overlap of two genomic ranges
3
2
Entering edit mode
10.6 years ago
Jimbou ▴ 960

Hello,

I need your help. I want to compare two dataframes with genomic features like this:

More precisely, frame 2 compared to frame 1

chr start end name        chr start end   name  
1    2    5   A           1    1    3       AA
2   10    15  B           2    9    16      BB
3   27    30  C           3    28   29      CC

As a result I want this:

  1. Completely overlapping:

    chr start end name    chr start end name
    2   10    15   B       2    9    16    BB
    
  2. Partial overlap:

    chr start end  name    chr start end   name 
    1    2    5    A        1    1    3    AA
    
  3. and within the range:

    chr start end name        chr start end   name 
    3   27    30  C            3    28   29    CC
    

Are the R packages IRanges and GenomicRanges suitable for such analysis? Or do I have to write some > < commands?

GRange sequence R • 12k views
ADD COMMENT
14
Entering edit mode
10.0 years ago
komal.rathi ★ 4.1k

Using the GenomicRanges library in R:

library(GenomicRanges)
x1 = read.table(text="chr start end name  
1    2    5   A
2   10    15  B
3   27    30  C",header=T)

x2 = read.table(text="chr start end   name  
1   1    1    3       AA
2   2    9    16      BB
3   3    28   29      CC",header=T)

# Make GRanges object
gr1 = with(x1, GRanges(chr, IRanges(start = start, end = end, names = name)))
gr2 = with(x2, GRanges(chr, IRanges(start = start, end = end, names = name)))

# Completely overlapping
type1 = findOverlaps(query = gr1, subject = gr2, type = "within")
type1.df = data.frame(x1[queryHits(type1),], x2[subjectHits(type1),])
type1.df
  chr start end name chr.1 start.1 end.1 name.1
2   2    10  15    B     2       9    16     BB

# Within range
type3 = findOverlaps(query = gr2, subject = gr1, type = "within")
type3.df = data.frame(x1[subjectHits(type3),], x2[queryHits(type3),])
type3.df
  chr start end name chr.1 start.1 end.1 name.1
3   3    27  30    C     3      28    29     CC

# Partial Overlaps only (no complete overlaps or within range overlaps)
type2 = findOverlaps(query = gr1, subject = gr2, type = 'any')
type2.df = data.frame(x1[queryHits(type2),], x2[subjectHits(type2),])
x = rbind(type1.df, type2.df, type3.df)
type2.df = x[!(duplicated(x) | duplicated(x, fromLast = TRUE)), ]
type2.df
   chr start end name chr.1 start.1 end.1 name.1
21   1     2   5    A     1       1     3     AA

You will get three different data frames as per your question. Based on your example, I am assuming type 1 and type 3 are opposites.

ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

How do I find out intergenic locations?

ADD REPLY
0
Entering edit mode

check the valr package

ADD REPLY
0
Entering edit mode

Not sure if this is new in IRanges but I tested using type='any' and it also gave me ranges from the query that were completely within the subject.

ADD REPLY
1
Entering edit mode
10.6 years ago

Take a look at BEDOPS bedmap --fraction-map, which allows you to recover map elements (those on your right-hand side) which overlap reference elements (those on your left-hand side) by some fractional value between 0 and 1, inclusive (i.e. between 0 and 100%).

For instance, if your reference and map data sets are sorted BED files called A and B, respectively, then you could do:

$ bedmap --echo --echo-map --fraction-map 1 A B

to get all elements from set B that completely overlap an element in set A.

These tools can run from the command line or within R, via system() calls.

ADD COMMENT
0
Entering edit mode
ADD COMMENT

Login before adding your answer.

Traffic: 2894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6