I have two sorted data tables (A and B), in table A, 3 first columns are genomic coordinates. I like to extract some specific rows from A table but based on some common columns/values of B table.
For example: First, I would like to match column 6 of A table and column 1 of B table, then if there is a 1
value in the third column of B table, select those rows (from A table) which have the same value (first match) in the fourth column of A and B tables and the coordinates before. But if there is a - 1
value in the third column of B table, select those rows (from A table) which have the same value (last match) in the fourth column of A and B tables and the coordinates after.
Input A:
17 50094737 50094898 1 ENST00000007708 ENSG00000005882
17 50096132 50096286 2 ENST00000007708 ENSG00000005882
17 50097423 50097564 3 ENST00000007708,ENST00000503176,ENST00000614357 ENSG00000005882
17 50105371 50105442 3 ENST00000007708,ENST00000503176,ENST00000614357 ENSG00000005882
17 50109957 50111058 1 ENST00000007708 ENSG00000005882
17 50109957 50111368 2 ENST00000614357 ENSG00000005882
17 50109957 50112152 1 ENST00000503176 ENSG00000005882
8 22987417 22987563 2 ENST00000519685 ENSG00000008853
8 22999579 23000105 1 ENST00000251822 ENSG00000008853
8 23004425 23004626 2 ENST00000251822,ENST00000519685 ENSG00000008853
8 23005372 23005475 2 ENST00000251822,ENST00000519685 ENSG00000008853
8 23006728 23007746 2 ENST00000251822,ENST00000519685 ENSG00000008853
8 23015638 23015743 2 ENST00000251822,ENST00000519685 ENSG00000008853
8 23017252 23020199 1 ENST00000251822 ENSG00000008853
X 119539386 119539789 1 ENST00000644802 ENSG00000018610
X 119541328 119541420 2 ENST00000320339,ENST00000644802 ENSG00000018610
X 119544353 119544491 2 ENST00000320339,ENST00000644802 ENSG00000018610
X 119560268 119560385 2 ENST00000320339,ENST00000644802 ENSG00000018610
X 119565232 119565401 1 ENST00000644802 ENSG00000018610
10 14518560 14521306 1 ENST00000181796 ENSG00000065809
10 14521158 14521306 1 ENST00000479731 ENSG00000065809
10 14521869 14522019 8 ENST00000181796,ENST00000378458,ENST00000378467,ENST00000378470,ENST00000468747,ENST00000478076,ENST00000479731,ENST00000622567 ENSG00000065809
10 14530332 14530515 8 ENST00000181796,ENST00000378458,ENST00000378467,ENST00000378470,ENST00000468747,ENST00000478076,ENST00000479731,ENST00000622567 ENSG00000065809
10 14553322 14553387 2 ENST00000378458,ENST00000378467 ENSG00000065809
10 14571763 14572189 2 ENST00000378458,ENST00000622567 ENSG00000065809
10 14572229 14572314 1 ENST00000479731 ENSG00000065809
10 14667634 14667691 1 ENST00000181796 ENSG00000065809
10 14774253 14774897 1 ENST00000181796 ENSG00000065809
Input B:
ENSG00000005882 PDK2 1 3
ENSG00000008853 RHOBTB2 1 2
ENSG00000018610 CXorf56 -1 2
ENSG00000065809 FAM107B -1 8
Desire Output:
17 50094737 50094898 1 ENST00000007708 ENSG00000005882
17 50096132 50096286 2 ENST00000007708 ENSG00000005882
17 50097423 50097564 3 ENST00000007708,ENST00000503176,ENST00000614357 ENSG00000005882
8 22987417 22987563 2 ENST00000519685 ENSG00000008853
X 119560268 119560385 2 ENST00000320339,ENST00000644802 ENSG00000018610
X 119565232 119565401 1 ENST00000644802 ENSG00000018610
10 14530332 14530515 8 ENST00000181796,ENST00000378458,ENST00000378467,ENST00000378470,ENST00000468747,ENST00000478076,ENST00000479731,ENST00000622567 ENSG00000065809
10 14553322 14553387 2 ENST00000378458,ENST00000378467 ENSG00000065809
10 14571763 14572189 2 ENST00000378458,ENST00000622567 ENSG00000065809
10 14572229 14572314 1 ENST00000479731 ENSG00000065809
10 14667634 14667691 1 ENST00000181796 ENSG00000065809
10 14774253 14774897 1 ENST00000181796 ENSG00000065809
Thanks a lot in advance for any possible solution!
Hi star, show us what you have tried, now it is like you want us to do your job...
I have tried with If else but, I do not know how to select first and the last match for 1 and -1 value or the coordinates after and before them.
It the cod that I have tried but it is not what I want:
I think learning some basic tricks with
dplyr
would be useful for you.NB. There seems to be commas in your first table, but not consistent. R might get troubles with that.
I get a slightly different output than your desired one, but maybe this helps you already