I want to replace the index of the dataframe (Gene Symbols) with GENE_ID:GENE_VALUE
to use as a data matrix input for netgsa R package (https://cran.r-project.org/web/packages/netgsa/vignettes/netgsa.html).
First, I retrieve the Entrez IDs:
library(org.Hs.eg.db)
library(AnnotationDbi)
# gene_value is the Entrez ID
gene_value <- as.data.frame(mapIds(org.Hs.eg.db, keys=rownames(meth_df), column="ENTREZID", keytype="SYMBOL")) # gene_value is the Entrez ID
Traceback:
'select()' returned 1:many mapping between keys and columns
Then, I want to append the string ENTREZID:
to the gene_value
variable.
rownames(meth_df) <- paste0("ENTREZID:", gene_value)
Traceback:
Error in
.rowNamesDF<-
(x, value = value) : invalid 'row.names' length
Expected rownames output (example):
## [1] "ENTREZID:127550" "ENTREZID:53947" "ENTREZID:65985" "ENTREZID:51166"
## [5] "ENTREZID:15" "ENTREZID:60496"
Example data:
> dput(meth_df[1:5,1:5])
structure(list(`TCGA-2K-A9WE-01A` = c(0.611033076810465, 0.786837244239289,
0.531054614303851, 0.711916183761331, 0.758443223998425), `TCGA-2Z-A9J1-01A` = c(0.468013052647261,
0.386177267500376, 0.508623627469028, 0.403601275088479, 0.754642399207848
), `TCGA-2Z-A9J2-01A` = c(0.593559707995411, 0.54983504208745,
0.535207192925841, 0.613971903755576, 0.717278085189431), `TCGA-2Z-A9J3-01A` = c(0.638211007873003,
0.319561448644096, 0.526699541432941, 0.450002172806716, 0.736440001203422
), `TCGA-2Z-A9J5-01A` = c(0.603998109440889, 0.638039512259872,
0.584328151056768, 0.594021097192165, 0.818583455926719)), row.names = c("A1BG",
"A1CF", "A2BP1", "A2LD1", "A2M"), class = "data.frame")
I have no error with your code and your example, what is your
package.version("AnnotationDbi")
?The package version for AnnotationDbi is "1.58.0"
You changed the post now you have no error. As some ENTREZID will be NA's for some genes, you will need to drop the genes with no ENTREZID in both your gene_value list and meth_df
It still gives the same error. The
meth_df
does not have missing genes (index value).You changed the error you had for
mapIds
, now you havegene_list
which is a vector containing ENTREZID for your input genes. What I would do is to merge both information, remove genes with no ENTREZID and change the rownames as expected :gene_value
produced error'select()' returned 1:many mapping between keys and columns
. So I need to solve this error first before I can proceed with your recommendation.I don't understand, you first said that you had this error : "
Error in mapIds_base(x, keys, column, keytype, ..., multiVals = multiVals) : mapIds must have at least one key to match against.
"After you modified your post and it worked resulting in "
'select()' returned 1:many mapping between keys and columns
" warning message but it is possible to deal with itNow in your code there is again the first error, so why is it changing ?
Sorry for the confusion. I edited my comment above. Error code is as follows: