Esembl geneID with Characters
1
0
Entering edit mode
2.1 years ago
Jakpa ▴ 50

Hi,

I have a df of gene expression that looks like this:

expre

I want to map the Esembl Id with Gene name/Symbol using org.Hs.eg.db with this code:

res_df$symbol = mapIds(org.Hs.eg.db, keys = rownames(res_df),
                   keytype = "ENSEMBL", column = "SYMBOL")

i got this error:

Error in .testForValidKeys(x, keys, keytype, fks): None of 
the keys entered are valid keys for 'ENSEMBL'. Please use the keys 
method to see a listing of valid arguments

Though, I saw similar post which relate to the decimal towards the end of the esembl Idand i tried fixing it with :

res_df=gsub("\\..*","",row.names(res_df)) it did not give the required output. Then I realized that Esemble Id column does not have a name. I tried to name it like this names(res_df)[0] <- "EsemblId", but the output remain same.

Now, I have more than 50,000 rows . How do I write a code in R to remove the decimal and the numbers after it e.i, Esembl Id?

I think if am able to do that, my first code will work well based on previous post that I read.

Regards,

Esembl annotation GeneExpression R • 980 views
ADD COMMENT
1
Entering edit mode
2.1 years ago
rownames(res_df) <- gsub("\\.[0-9]+$", "", rownames(res_df))

Or if you prefer the tidyverse

library("stringr")

rownames(res_df) <- str_remove(rownames(res_df), "\\.[0-9]+$")
ADD COMMENT
0
Entering edit mode

rpolicastro , Thank you for your response. your code seems to work. but, its like the output time validity. I noticed that after few minutes of getting the output that i want, if I run it again, it will give error like this

res= mapIds(org.Hs.eg.db, keys = rownames(res),
                   keytype = "ENSEMBL", column = "SYMBOL",
                   multiVals = "first")

select()' returned 1:many mapping between keys and columns

then, this output:

ENSG00000000003'TSPAN6'ENSG00000000005'TNMD'ENSG00000000419'DPM1'ENSG00000000457'SCYL3'ENSG00000000460'C1orf112'ENSG00000000938'FGR'ENSG00000000971'CFH'ENSG00000001036'FUCA2'ENSG00000001084'GCLC'ENSG00000001167'NFYA'ENSG00000001460'STPG1'ENSG00000001461'NIPAL3'ENSG00000001497'LAS1L'ENSG00000001561'ENPP4'ENSG00000001617'SEMA3F'ENSG00000001626'CFTR'ENSG00000001629'ANKIB1'

instead of genesymbol as column with other variables like pvalue, Log2Foldchange etc. also, majority of the genesymbol are NAs

Please, any idea on how to resolve this?

ADD REPLY
0
Entering edit mode

Most genes as annotated by Ensembl do not have gene symbols, so when you fetch them, the NAs effectively mean "this gene does not have a name".

ADD REPLY

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6