Hi,
I have a df of gene expression that looks like this:
I want to map the Esembl Id with Gene name/Symbol using org.Hs.eg.db
with this code:
res_df$symbol = mapIds(org.Hs.eg.db, keys = rownames(res_df),
keytype = "ENSEMBL", column = "SYMBOL")
i got this error:
Error in .testForValidKeys(x, keys, keytype, fks): None of
the keys entered are valid keys for 'ENSEMBL'. Please use the keys
method to see a listing of valid arguments
Though, I saw similar post which relate to the decimal
towards the end of the esembl Id
and i tried fixing it with :
res_df=gsub("\\..*","",row.names(res_df))
it did not give the required output. Then I realized that Esemble Id
column does not have a name. I tried to name it like this names(res_df)[0] <- "EsemblId"
, but the output remain same.
Now, I have more than 50,000
rows . How do I write a code in R to remove the decimal and the numbers after it e.i, Esembl Id
?
I think if am able to do that, my first code will work well based on previous post that I read.
Regards,
rpolicastro , Thank you for your response. your code seems to work. but, its like the output time validity. I noticed that after few minutes of getting the output that i want, if I run it again, it will give error like this
select()' returned 1:many mapping between keys and columns
then, this output:
ENSG00000000003'TSPAN6'ENSG00000000005'TNMD'ENSG00000000419'DPM1'ENSG00000000457'SCYL3'ENSG00000000460'C1orf112'ENSG00000000938'FGR'ENSG00000000971'CFH'ENSG00000001036'FUCA2'ENSG00000001084'GCLC'ENSG00000001167'NFYA'ENSG00000001460'STPG1'ENSG00000001461'NIPAL3'ENSG00000001497'LAS1L'ENSG00000001561'ENPP4'ENSG00000001617'SEMA3F'ENSG00000001626'CFTR'ENSG00000001629'ANKIB1'
instead of
genesymbol
as column with other variables likepvalue
,Log2Foldchange
etc. also, majority of thegenesymbol
areNAs
Please, any idea on how to resolve this?
Most genes as annotated by Ensembl do not have gene symbols, so when you fetch them, the NAs effectively mean "this gene does not have a name".