Extract the values by matching two rows of one dataframe with the two columns of another dataframe
1
2
Entering edit mode
9.7 years ago
MAPK ★ 2.1k

Hi everyone,

I have a programming question and I want this to be done in R. I have two data frames. The df1 has first three columns as header line and the file is in xlsx format. The second data frame has first line as a header. I want to search the genes from the first line of df1 along with their corresponding mutation to match the genes and mutation in df2 and extract the corresponding values. I want the corresponding values to be inserted in the sample line (I have indicated those columns with *). Also I am not concerned anything with the columns indicated by X. I am a beginner in programming and would really appreciate if you guys could help me get this done. Thank you.

df1:

Gene         MPK       MPKK     MPKKK
Mutation     D888H     T66S     D44G
Sample       *         *        *
1            X         X        X
2            X         X        X
3            X         X        X

df2:

Gene      Values     Mutation
MPK       211        D888H
MPKK      443        T66S
MPK       44         D334G
MPKK      222        S434G
MPKKK     223        D44G

I want this Output:

Gene         MPK       MPKK     MPKKK
Mutation     D888H     T66S     D44G
Sample       211       443      223
1            X         X        X
2            X         X        X
3            X         X        X
dataframe match • 44k views
ADD COMMENT
1
Entering edit mode
9.7 years ago
Phil S. ▴ 700

Hi,

This is quick and dirty code. There is, for sure, a way more elegant way of doing this in R. Since I am in a hurry this should get you started for the table structure you provided...

> df1 <- read.table("df1.txt")
> df2 <- read.table("df2.txt")

> df1
        V1    V2   V3    V4
1     Gene   MPK MPKK MPKKK
2 Mutation D888H T66S  D44G
3   Sample     *    *     *
4        1     X    X     X
5        2     X    X     X
6        3     X    X     X

> df2
     V1     V2       V3
1  Gene Values Mutation
2   MPK    211    D888H
3  MPKK    443     T66S
4   MPK     44    D334G
5  MPKK    222    S434G
6 MPKKK    223     D44G


for(i in 2:ncol(df1)){
+ tmp <- df2[which(df2[,1]==df1[1,i]),]
+ df1[3,i] <- tmp[which(tmp[,3]==df1[2,i]),2]
+ }
> df1
        V1    V2   V3    V4
1     Gene   MPK MPKK MPKKK
2 Mutation D888H T66S  D44G
3   Sample   211  443   223
4        1     X    X     X
5        2     X    X     X
6        3     X    X     X

Feel free to modify etc...

HTH,
Phil

ADD COMMENT
2
Entering edit mode

A 2 line (that could be 1) version:

idx <- match(paste(colnames(df1),df1[1,]), paste(df2$Gene,df2$Mutation))
df1[2,2:4] <- df2$Values[idx[-1]]

For really large matches, inner_join/left_join/etc. from dplyr are more efficient. Note that I used the first line as a header, though one could easily modify things were that not the case.

ADD REPLY
0
Entering edit mode

Hi everyone

Could you please help me on this error, after running above for loop I have below error:

Error in Ops.factor(df2[, 1], df1[1, i]) : 
  level sets of factors are different

Any help would appreciated!

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6