Conversion of gene ids between Gencode V24 and Gencode V38

0

Entering edit mode

4 days ago

Sayan • 0

Hi,

I have two RNA-seq data sets. In one dataset, the genes are named after Gencode V24; in the dataset, the genes are named after Gencode V38. I am trying to map these to datasets. Can someone please provide guidance on converting gene ids across gencode versions?

Many Thanks!!

gencode • 230 views

ADD COMMENT • link updated 3 days ago by ATpoint 85k • written 4 days ago by Sayan • 0

0

Entering edit mode

Aren't Ensembl gene IDs stable, so you could simply take the intersect?

ADD REPLY • link 4 days ago by ATpoint 85k

0

Entering edit mode

Thanks for the reply! Here's an example

      chromosome      start            end         strand         gene_id_v24               gene_name             gene_id_v38
0    chrX             100627109       100639991      -            ENSG00000000003.14        TSPAN6                ENSG00000000003.15

So, are you recommending to map by only considering the ID ENSG00000000003 and ignoring whatever is after .XX (i.e., the version) ?

ADD REPLY • link 4 days ago by Sayan • 0

1

Entering edit mode

The number after the dot is the version number, and I would need to lie if I precisely knew what it even represents, but focusing on the core gene ID is what I always do. This is the advantage of Ensembl IDs, they're stable, hence can reliably be tracked across versions. Yes, I would only go by Ensembl gene ID. Both v24 and v38 are based on the same genome (GRCh38) after all.

ADD REPLY • link 3 days ago by ATpoint 85k

Login before adding your answer.