Question

How to replace rownames in one data frame with column in another data frame

0

Entering edit mode

5.9 years ago

anamaria ▴ 220

Hi,

I have a dataframe which looks like this:

>head(candidate_gene_exprs1)
             DwC_1026_30mM.1 DwC_1026_30mM.2 DwC_1026_30mM.3 DwC_1026_norm.1 DwC_1026_norm.2 DwC_1026_norm.3 DwC_6009_30mM.1
ILMN_1762337        4.803651        4.531582        4.992943        5.077257        4.363542        4.520649        4.221391
ILMN_3241953        5.877695        6.127418        6.094072        5.815746        6.155299        5.859430        5.890158
ILMN_1735045        7.133542        7.251070        7.260408        7.253772        7.256665        7.169209        7.270749
ILMN_2375184        5.650717        5.889364        5.942529        5.768990        5.895180        5.833825        3.953553
ILMN_1659452        5.910706        6.170079        6.218347        5.975649        6.133477        6.132866        4.250749
ILMN_1755321        5.458909        5.232720        5.311850        5.759837        5.320741        5.112800        5.585782

And I would like to replace thesse ILMN rownames there with corresponding values in column GeneSymbol in this dataframe:

> head(mypro)
                           illumina_probe_id      geneSymbol
ILMN_1762337      ILMN_1762337      MACC1
ILMN_3241953      ILMN_3241953      GGACT
ILMN_1735045      ILMN_1735045     A4GALT
ILMN_2375184      ILMN_2375184  NPSR1-AS1
ILMN_1659452      ILMN_1659452  NPSR1-AS1
ILMN_1755321      ILMN_1755321       AAAS

I tried doing this:

ismr3 <- lapply(candidate_gene_exprs1, function(x){ row.names(x)<-as.character(mypro$geneSymbol)})

but I got:

 Error in `rownames<-`(x, value) : 
  attempt to set 'rownames' on an object with no dimensions

One of the obvious issues here is that some entries in mypro$geneSymbol are not unique and that is not allowed in rownames. My ultimate goal is to create a matrix where there geneSymbol names would be columns in matrix and DwC_# names would be rows in that matrix.

R • 10k views

ADD COMMENT • link updated 5.9 years ago by zx8754 12k • written 5.9 years ago by anamaria ▴ 220

score 1 · Answer 1 · 2019-06-12

1

Entering edit mode

5.9 years ago

swbarnes2 14k

Have you looked up the "merge" function? I think that's the safest way to make sure that everyone ends up where it should Once the database are merged, you can make any column the rownames, and delete what you don't want.

ADD COMMENT • link 5.9 years ago by swbarnes2 14k

0

Entering edit mode

sure I will do the merge function and that will give me Gene column in my original data frame. But then how to transform that merged data frame into matrix where columns would be entries on that Gene column and rows these DwC_#?

ADD REPLY • link 5.9 years ago by anamaria ▴ 220

0

Entering edit mode

Have you looked up how to transpose a data table?

ADD REPLY • link 5.9 years ago by swbarnes2 14k

score 0 · Answer 2 · 2019-06-13

Keep geneSymbol as a column, not as rownames. Row names must be unique, from your simple example we can see 2 probes are getting assigned to the same gene (NPSR1-AS1), and using rownames for this would fail. See example below, where we keep geneSymbol as a new column.

Also, some probes might not match and result in NA, (see "ILMN_1735045xx", in my example) again this will be a problem to assign NA to rownames.

# example input
candidate_gene_exprs1 <- read.table(text = "
 DwC_1026_30mM.1
ILMN_1762337        4.803651
ILMN_3241953        5.877695
ILMN_1735045xx        7.133542
ILMN_2375184        5.650717
ILMN_1659452        5.910706
ILMN_1755321        5.458909"
, header = TRUE)


mypro <- read.table(text = "
illumina_probe_id      geneSymbol
ILMN_1762337      MACC1
ILMN_2375184  NPSR1-AS1
ILMN_1659452  NPSR1-AS1
ILMN_1755321       AAAS
ILMN_3241953      GGACT
ILMN_1735045     A4GALT
", header = TRUE, stringsAsFactors = FALSE)

# result
cbind(geneSymbol = mypro$geneSymbol[ match(rownames(candidate_gene_exprs1), mypro$illumina_probe_id) ],
      candidate_gene_exprs1)
#                geneSymbol DwC_1026_30mM.1
# ILMN_1762337        MACC1        4.803651
# ILMN_3241953        GGACT        5.877695
# ILMN_1735045xx       <NA>        7.133542
# ILMN_2375184    NPSR1-AS1        5.650717
# ILMN_1659452    NPSR1-AS1        5.910706
# ILMN_1755321         AAAS        5.458909