I know this question has been asked in various iterations before, and it seems straightforward but I can't figure out how to get it to work. I've tried various things, spent a lot of time.
I have a gene count matrix with cells as columns and rownames are Ensembl IDs for mouse.
[,1]
ENSMUSG00000104352 0
ENSMUSG00000104046 0
ENSMUSG00000102907 0
ENSMUSG00000025905 0
ENSMUSG00000103936 0
ENSMUSG00000093015 0
I tried something like this
rownames(counts) <- mapIds(org.Mm.eg.db,keys=rownames(counts),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
But the issue I run into is that I get many NAs because not every Ensembl ID maps to a Gene Name. Also, for some Gene Names, multiple Ensembl IDs map.
So if I run the code above, I get this output:
[,1]
<NA> 0
Gm26206 0
Xkr4 0
Gm18956 0
<NA> 0
<NA> 0
<NA> 0
<NA> 0
<NA> 0
Gm7341 0
I saw this response, to keep Ensembl IDs if NA, but it didn't work because some gene names are duplicated and the matrix can't have duplicate row names.
R: converting Ensembl row names to Symbol ID outputs missing values in 'row.names' are not allowed
Can someone point me in the right direction on how to deal with the NAs and duplicates?
The goal is to replace the rownames with Gene Names, so when I do my downstream Seurat work, I don't have to keep looking up Ensembl IDs
You have duplicate ID's in your matrix?
Sorry. I'll change the phrasing. Every Ensembl ID is unique but multiple Ensembl IDs map to the same gene name.
You may want to take a look at Multiple ensembl gene ID for the same gene name (Symbol), how to deal with this while differential analysis? and comments/links within.