Matching Column Elements
1
0
Entering edit mode
11.1 years ago
robjohn7000 ▴ 110

Hi,

I want to compare the elements in column V2 of two tables (below): I'm interested in rows where the elements of columns are close to each other by 3 - 6. The two tables may have different numbers of rows.

For example, row 1 in both tables 1 and 2 differ by 3, I want to add the two elements together and output the mean along with other elements wherever the criterion is satisfied (difference of between 3 - 6) occurs as in the example output.

Table1:

V1    V2    V3    V4    
TX14566 200    27    10
TX14566 187    27    10
TX14566 723    27    10
TX14566 17    27    10

Table2:

V1    V2    V3    V4    
TX14566 203    27    10
TX14566 200    27    10
TX14566 729    27    10
TX14566 39    27    10

Example output (only rows 1 and 3 satisfying the criterion are outputted):

V1    V2    V3    V4    
TX14566 202    27    10
TX14566 726    27    10

I have tried using merge function, thus:

xx <- merge(Tab1,Tab2,by=intersect(Tab1$V2,Tab1$V2+3:4)

which did not work. I'm unsure about the correct way to write the code in R and would be grateful for help. Thanks

r bioconductor • 2.2k views
ADD COMMENT
1
Entering edit mode

Please make it more clear. If two tables can have different number of rows than what should be used as an primary identifier to know which two exact rows should be compared. Here all the rows have the same primary identifier i.e. TX14566. Also, what if the values are not different in the first column of a row but some other column. Do you still want to take the mean or ignore them. What if values differ by more than 6 units in one column and less than 3 in other columns. You will have to come up with clear rules. Take some time and create a good example.

ADD REPLY
0
Entering edit mode
11.1 years ago

The following will show you which rows are separated by the distance range you want. It's completely unclear how you might want dataframes with differing dimensions handled.

#given dataframes foo1 and foo2
foo1[which(abs(foo1$V2-foo2$V2) %in% c(3:6)),]
ADD COMMENT
0
Entering edit mode

Thanks guys! Apologies for my lack of clarity. ashutoshmits - you're right that the same primary identifier ( TX14566) makes the whole thing more difficult than it should be. Only rows satisfying the distance rule are retained, while others are to be eliminated. Following on from dpryan79 code then, which allows me to know the rows satisfying the rule, I will like to find the means of the two values from the two data frames and get a final output as in the example output above (200+203=202; 723+729=726).

Although got errors after modifying the code:

foo1[which(abs(foo1$V2-foo2$V2) %in% c(3:6)),]
file1[which(abs(file1$V2-file2$V2) %in% c(3:6)),]

errors:

<0 rows> (or 0-length row.names)
 Warning message:
 In file1 - file2 :
 longer object length is not a multiple of shorter object length
ADD REPLY

Login before adding your answer.

Traffic: 2132 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6