Question

I encountered some problem while using bitr() for gene ID transversion.

0

Entering edit mode

4 weeks ago

Pallondyle • 0

Some ENSEMBL IDs seem to match more than one gene ID. It doesn't make sense. My code:

## Merge gene ID columns
gene_entrezid <- bitr(geneID = rownames(P_change), 
                      fromType = "ENSEMBL", 
                      toType = "SYMBOL", # ENTREZID transversion
                      OrgDb = "org.Mm.eg.db"
)
ID_bind<-function(a){
    a2<-cbind(a,rownames(a))
    colnames(a2)[7]<-"ENSEMBL"
    a2<-merge(a2,gene_entrezid,by="ENSEMBL",all.y =F)
    rownames(a2)<-a2$ENSEMBL
    a2<-a2[,-1]
    return(a2)
}
P_change<-ID_bind(P_change)

Error:

#Error in `.rowNamesDF<-`(x, value = value) : 
#  duplicate row.names are not allowed
#In addition: Warning message:
#non-unique values when setting 'row.names': ‘ENSMUSG00000000486’, ‘ENSMUSG00000000562’, ‘ENSMUSG00000001768’, #‘ENSMUSG00000002250’, ‘ENSMUSG00000003271’, ‘ENSMUSG00000003680’, ‘ENSMUSG00000003812’, ‘ENSMUSG00000004455’, ‘ENSMUSG00000005983’, ‘ENSMUSG00000015290’, ‘ENSMUSG00000015341’, ‘ENSMUSG00000015882’, ‘ENSMUSG00000018378’, ‘ENSMUSG00000019865’, ‘ENSMUSG00000019868’, ‘ENSMUSG00000021557’, ‘ENSMUSG00000021846’, ‘ENSMUSG00000021983’, ‘ENSMUSG00000022820’, ‘ENSMUSG00000023156’, ‘ENSMUSG00000024571’, ‘ENSMUSG00000025194’, ‘ENSMUSG00000025646’, ‘ENSMUSG00000027022’, ‘ENSMUSG00000028700’, ‘ENSMUSG00000029089’, ‘ENSMUSG00000029592’, ‘ENSMUSG00000029723’, ‘ENSMUSG00000030337’, ‘ENSMUSG00000031167’, ‘ENSMUSG00000032750’, ‘ENSMUSG00000032872’, ‘ENSMUSG00000035171’, ‘ENSMUSG00000036381’, ‘ENSMUSG00000037747’, ‘ENSMUSG00000038209’, ‘ENSMUSG0000004019 [... truncated]

I tried to search for corresponding gene information on the official website, but strangely, some genes with completely different ENSEMBL IDs can still be found by searching for these gene IDs.

enter image description here

I am currently confused about two things: Firstly, how did this happen? Secondly, if I still want to associate gene IDs with transcriptome data annotated with ENSEMBL, what is a reasonable approach?

bioMart • 351 views

ADD COMMENT • link 4 weeks ago by Pallondyle • 0

score 0 · Answer 1 · 2024-11-17

0

Entering edit mode

4 weeks ago

Pallondyle • 0

I solved this problem by downloading data directly from the ENSEMBL website. But I am still curious about how such a thing happened.

ADD COMMENT • link 4 weeks ago by Pallondyle • 0

1

Entering edit mode

The top result in your example is the gene summary entry for ENSMUSG00000035171 - the Ensembl gene id for the gene.

The other result in your example is a specific link to the gene tree view for ENSMUSG00000035171

As for why duplicate results are returned, you can examine the duplicate rows for ENSMUSG00000035171 in your gene_entrezid table to see what the discrepancy is. But it will likely be down to changes in annotation over verions of different annotation types.