Some ENSEMBL IDs seem to match more than one gene ID. It doesn't make sense. My code:
## Merge gene ID columns
gene_entrezid <- bitr(geneID = rownames(P_change),
fromType = "ENSEMBL",
toType = "SYMBOL", # ENTREZID transversion
OrgDb = "org.Mm.eg.db"
)
ID_bind<-function(a){
a2<-cbind(a,rownames(a))
colnames(a2)[7]<-"ENSEMBL"
a2<-merge(a2,gene_entrezid,by="ENSEMBL",all.y =F)
rownames(a2)<-a2$ENSEMBL
a2<-a2[,-1]
return(a2)
}
P_change<-ID_bind(P_change)
Error:
#Error in `.rowNamesDF<-`(x, value = value) :
# duplicate row.names are not allowed
#In addition: Warning message:
#non-unique values when setting 'row.names': ‘ENSMUSG00000000486’, ‘ENSMUSG00000000562’, ‘ENSMUSG00000001768’, #‘ENSMUSG00000002250’, ‘ENSMUSG00000003271’, ‘ENSMUSG00000003680’, ‘ENSMUSG00000003812’, ‘ENSMUSG00000004455’, ‘ENSMUSG00000005983’, ‘ENSMUSG00000015290’, ‘ENSMUSG00000015341’, ‘ENSMUSG00000015882’, ‘ENSMUSG00000018378’, ‘ENSMUSG00000019865’, ‘ENSMUSG00000019868’, ‘ENSMUSG00000021557’, ‘ENSMUSG00000021846’, ‘ENSMUSG00000021983’, ‘ENSMUSG00000022820’, ‘ENSMUSG00000023156’, ‘ENSMUSG00000024571’, ‘ENSMUSG00000025194’, ‘ENSMUSG00000025646’, ‘ENSMUSG00000027022’, ‘ENSMUSG00000028700’, ‘ENSMUSG00000029089’, ‘ENSMUSG00000029592’, ‘ENSMUSG00000029723’, ‘ENSMUSG00000030337’, ‘ENSMUSG00000031167’, ‘ENSMUSG00000032750’, ‘ENSMUSG00000032872’, ‘ENSMUSG00000035171’, ‘ENSMUSG00000036381’, ‘ENSMUSG00000037747’, ‘ENSMUSG00000038209’, ‘ENSMUSG0000004019 [... truncated]
I tried to search for corresponding gene information on the official website, but strangely, some genes with completely different ENSEMBL IDs can still be found by searching for these gene IDs.
I am currently confused about two things: Firstly, how did this happen? Secondly, if I still want to associate gene IDs with transcriptome data annotated with ENSEMBL, what is a reasonable approach?
The top result in your example is the gene summary entry for ENSMUSG00000035171 - the Ensembl gene id for the gene.
The other result in your example is a specific link to the gene tree view for ENSMUSG00000035171
As for why duplicate results are returned, you can examine the duplicate rows for ENSMUSG00000035171 in your
gene_entrezid
table to see what the discrepancy is. But it will likely be down to changes in annotation over verions of different annotation types.Thx, I will check it later