Question

Group and merge entries in a table according to a second one

0

Entering edit mode

6.0 years ago

ste.lu ▴ 80

Hi All,

I have two tables so structured:

Table1:
chr1    1   16007   8899
chr1    16008   24571   294334
chr1    24572   27981   192966
chr1    27982   30429   708562
chr1    30430   32153   610786
chr1    32154   32774   837824
chr1    32775   37752   177990
chr1    37753   38369   9374
chr1    38370   38791   367385
chr1    38792   39255   552088
...

Table 2:
chr1    1   111 13177   2005
chr1    112 223 9221    7796
chr1    224 340 2625    5833
chr1    341 461 14128   1124
chr1    462 580 3211    12051
chr1    581 701 9551    1563
chr1    702 825 10751   6891
chr1    826 939 3301    11244
chr1    940 1059    8072    4022
chr1    1060    1173    380 5584
...

what I need to do is to group and merge all the data in "table 2" to in the categories of "table 1". The relevant columns are 1:3.

if table2(x;y) is a subset of table1 (i;j), then make a table 3 where all the entry (x;y) are merged according to (i:j). For instance, in this example, all the entry of table2 are a subset of the first entry of table 1. In the end, I'd like to have:

Table3:
chr1    1   16007

R next-gen assembly • 998 views

ADD COMMENT • link 6.0 years ago by ste.lu ▴ 80

1

Entering edit mode

Maybe you can use bedtools in R for this, like described here.

ADD REPLY • link 6.0 years ago by Benn 8.3k

2

Entering edit mode

Bedtools was the easier!!

bedtools intrestect -a table1 -b table2 -wa > table3

and then filter table 3 for the repetitive entries

awk '!a[$0]++'  table3

ADD REPLY • link 6.0 years ago by ste.lu ▴ 80

1

Entering edit mode

Great solution! I recommended to do it in R because you tagged "R".

ADD REPLY • link 6.0 years ago by Benn 8.3k

1

Entering edit mode

Have a look at GenomicRanges/IRanges and its intersection/overlapping functions. You simply have to check for overlaps, and if TRUE, then return the entry of table1. The vignettes contain plenty of examples.

ADD REPLY • link 6.0 years ago by ATpoint 85k