Group and merge entries in a table according to a second one
0
0
Entering edit mode
6.0 years ago
ste.lu ▴ 80

Hi All,

I have two tables so structured:

Table1:
chr1    1   16007   8899
chr1    16008   24571   294334
chr1    24572   27981   192966
chr1    27982   30429   708562
chr1    30430   32153   610786
chr1    32154   32774   837824
chr1    32775   37752   177990
chr1    37753   38369   9374
chr1    38370   38791   367385
chr1    38792   39255   552088
...

Table 2:
chr1    1   111 13177   2005
chr1    112 223 9221    7796
chr1    224 340 2625    5833
chr1    341 461 14128   1124
chr1    462 580 3211    12051
chr1    581 701 9551    1563
chr1    702 825 10751   6891
chr1    826 939 3301    11244
chr1    940 1059    8072    4022
chr1    1060    1173    380 5584
...

what I need to do is to group and merge all the data in "table 2" to in the categories of "table 1". The relevant columns are 1:3.

if table2(x;y) is a subset of table1 (i;j), then make a table 3 where all the entry (x;y) are merged according to (i:j). For instance, in this example, all the entry of table2 are a subset of the first entry of table 1. In the end, I'd like to have:

Table3:
chr1    1   16007
R next-gen assembly • 997 views
ADD COMMENT
1
Entering edit mode

Maybe you can use bedtools in R for this, like described here.

ADD REPLY
2
Entering edit mode

Bedtools was the easier!!

bedtools intrestect -a table1 -b table2 -wa > table3

and then filter table 3 for the repetitive entries

awk '!a[$0]++'  table3
ADD REPLY
1
Entering edit mode

Great solution! I recommended to do it in R because you tagged "R".

ADD REPLY
1
Entering edit mode

Have a look at GenomicRanges/IRanges and its intersection/overlapping functions. You simply have to check for overlaps, and if TRUE, then return the entry of table1. The vignettes contain plenty of examples.

ADD REPLY

Login before adding your answer.

Traffic: 1617 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6