compare two data frames column to column
2
2
Entering edit mode
9.8 years ago
MAPK ★ 2.1k

Hi everyone,

I have two csv files where the content of one file is in other file, but not in exactly same column/row position. The headers and IDs are same, but the corresponding contents are not always same. Is there any way I can look for each ID/header content of one file into another and see if they are same. I would prefer this in R, but not sure how to get this done. Please advise.

Thank you.

Here are the files and I want to see where exactly they differ.

File1

    mpk    mkk    mkkk   Rdh    Jak
1    5,6    22,22    18,19    20,20    50,40
2    10,19   12,14  19,22    34,50    40,44
3    11,13    15,16    22,22    50,50    34,38

File 2

   mpk    mkk    mkkk   Rdh    Jak
1    8,8   11,11    18,19    20,20    50,40
2    10,19   12,14  19,22    16,18    40,44
3    11,13    15,16    22,22    50,50    3,3
R csv • 13k views
ADD COMMENT
4
Entering edit mode

This is not a bioinformatics problem, especially with the example given!

ADD REPLY
0
Entering edit mode

OK I changed it. Here is the actual problem. I made that old example to make you guys understand.

ADD REPLY
0
Entering edit mode

So basically, you are trying to see if two particular column from two different file, disregarding their title, whether if they are the same?

ADD REPLY
2
Entering edit mode
9.8 years ago
Josh Herr 5.8k

In addition to Sam's answer, this can be done pretty easily in R with the merge function or with plyr. I agree this example is not a bioinformatics question, but I recently had to do this for RNA-Seq count data, so I understand how you could use it.

These two answers are not bioinformatics specific, but they should help you:

  1. How do I combine two data-frames based on two columns?
  2. Merge a lot of data frames in R
ADD COMMENT
1
Entering edit mode
9.8 years ago
Sam ★ 4.8k

If I understand you correctly, you are trying to do the following:

e.g. File A contains content of File B

All columns in File B can be found in File A

You want to find that for each column in File A that can be found in File B, whether if the content is the same. If that is the case, you can do the following assuming in R

A = read.table("FileA", header=T)
B = read.table("FileB", header=T)

for(i in ncol(B)){
  correspondingACol  = grep(colnames(B)[i] , colnames(A))
  A[,correspondingACol]%in%B[,i] #True means the content in A can be found in B.
}

If however, you want to match multiple columns, e.g. it is only considered as a match if and only if all columns in B can be found in A, then I guess you can use the merge function

ADD COMMENT

Login before adding your answer.

Traffic: 1553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6