I have a .csv file as follows:
,TEST1,TEST2
ENSG00000197421,2,0
ENSG00000213753,0,2
ENSG00000168746,0,2
ENSG00000261824,3,0
ENSG00000128310,1,2
ENSG00000235091,9,4
In R, I import the file like this:
> d <- read.csv("my_file.csv", header=TRUE, row.names=1)
> d
TEST1 TEST2
ENSG00000197421 2 0
ENSG00000213753 0 2
ENSG00000168746 0 2
ENSG00000261824 3 0
ENSG00000128310 1 2
ENSG00000235091 9 4
Checking that I do not have any duplicates:
> rownames(d)
[1] "ENSG00000197421" "ENSG00000213753" "ENSG00000168746" "ENSG00000261824"
[5] "ENSG00000128310" "ENSG00000235091"
> colnames(d)
[1] "TEST1" "TEST2"
> any(duplicated(rownames(d)))
[1] FALSE
> any(duplicated(colnames(d)))
[1] FALSE
Load libraries:
> suppressMessages(library("AnnotationDbi"))
> suppressMessages(library("org.Hs.eg.db"))
Then try to convert my Ensembl row names to Symbol in place:
> rownames(d) <- mapIds(org.Hs.eg.db,keys=rownames(d),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
missing values in 'row.names' are not allowed
NOTE: Removing the first ',' on 'my_file.csv' did not help neither.
I managed to create a new field with the converted IDs but cannot replace it to the row names:
> d$SYMBOL <- mapIds(org.Hs.eg.db,keys=rownames(d),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
> d
TEST1 TEST2 SYMBOL
ENSG00000197421 2 0 GGT3P
ENSG00000213753 0 2 CENPBD1P1
ENSG00000168746 0 2 LINC01620
ENSG00000261824 3 0 LINC00662
ENSG00000128310 1 2 GALR3
ENSG00000235091 9 4 <NA>
> d_subset <- subset(d, !is.na(d$SYMBOL))
> d_subset
TEST1 TEST2 SYMBOL
ENSG00000197421 2 0 GGT3P
ENSG00000213753 0 2 CENPBD1P1
ENSG00000168746 0 2 LINC01620
ENSG00000261824 3 0 LINC00662
ENSG00000128310 1 2 GALR3
> rownames(d) <- d$SYMBOL
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
missing values in 'row.names' are not allowed
I don't get it.
Here, missing values means
NA
s, which can not be used as row names. You need to convert them to unique names (because duplicate row names are not allowed).