NAs using org.Hs.eg.db for Ensembl ID to Gene Symbol annotation
0
0
Entering edit mode
2.0 years ago
Jakpa ▴ 50

Hi Everyone,

I did Ensembl ID to Gene Symbol annotation using org.Hs.eg.db with this code:

annotn = mapIds(org.Hs.eg.db, keys = rownames(res),
                   keytype = "ENSEMBL", column = "SYMBOL",
                   multiVals = "first")

res is a geneExpression dataFrame that has more than 50,000 ensembleID. but, after running the annotation, about 45% are NAs. i.e, they were not assigned any gene symbol.

is that org.Hs.eg.db could not properly do the mapping due to the dataset? or my syntax is not totally correct?

how do I fix this? are there other options?

I dont want to delete theEnsembl ID with NAs

regards,

RNASeq annotation expression Gene R • 1.3k views
ADD COMMENT
0
Entering edit mode

Hi! Can you show some IDs (rownames(res))?

ADD REPLY
0
Entering edit mode

iraun ,

'ENSG00000288663''ENSG00000288667''ENSG00000288669''ENSG00000288670''ENSG00000288674''ENSG00000288675'

ADD REPLY
0
Entering edit mode

Not all ensembl IDs have an associated gene name, and there are genes with a single gene name, but multiple gene IDs (the joys of gene identifiers).

If you want to ensure the best chance of mapping your gene IDs to the gene symbols make sure you are using the same ensembl release version the gene expression data was generated from. If there isn't an org database associated with that release you could use biomaRt to convert the IDs instead.

ADD REPLY
0
Entering edit mode

rpolicastro , Thanks for your response. Initially, i did this using biomart

ensid_symbol<-function(ids){
  mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
  genes <- getBM(filters= "ensembl_gene_id", 
                 attributes= c("ensembl_gene_id","hgnc_symbol"),
                 values=ids, mart= mart)
  return(genes)
    }

df <- ensid_symbol(row.names(res_output))

result_df <- as.data.frame(res_output)                 
result_df$ensembl_gene_id <- row.names(result_df)
result_df <- merge(df,result_df, by = "ensembl_gene_id")
resOrdered<-result_df[with(result_df, order(abs(log2FoldChange), padj, decreasing = TRUE)), ]

and i got empty observations with only column names

I tried to solve the problem, but i couldnt .

Please, Can you spot any code error?

ADD REPLY

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6