Conversion between Ensembl identifiers and gene symbols obtained from different annotation files
1
1
Entering edit mode
5.3 years ago
lu.ne ▴ 70

Hi everyone,

I've been looking at gene expression patterns in one dataset (dataset A) for which I have Ensembl gene identifiers obtained using the annotation file corresponding to Ensembl version 86 (GRCh38.p16). I now want to look at the patterns in another dataset (dataset B) for which the genes are identified by gene symbols (annotated using Gencode release 31 GRCh38.p12). I would like to match the identifiers from both datasets and considered several strategies, however, I'm struggling to find which solution would be the best:

1/ Convert Ensembl gene identifiers from dataset A to gene symbols using biomaRt and the version corresponding to Gencode release 31

Possible issues: I'm unsure which version of Ensembl would be the best to do that, will it be the first Ensembl version with patch 12 (that would be version 92 from April 2018)? Or is this more complicated than that?

2/ Convert gene symbols from dataset B to Ensembl gene identifiers using biomaRt and the Ensembl version corresponding to Gencode release 31 (used for the annotation of dataset B)

Possible issues: As previously, I'm not sure which Ensembl version to use and also I'm not entirely sure Ensembl gene identifiers will refer to the exact same things between the version?

3/ Convert gene symbols from dataset B to Ensembl gene identifiers using biomaRt and the version corresponding to Ensembl version 86 (used for the annotation of dataset A)

Possible issues: Pretty much the same ones as 2/

4/ Re-annotate one of the two sets so that they are both annotated with the same file

Possible issues: I don't have access to the raw data...

I'm also thinking that I could be misunderstanding the 'stability' part of Ensembl gene identifiers or possibly overcomplicating the whole thing...

In any case, I'd appreciate some suggestions/explanations to help clear the fog away from my mind...

Thank you!

ensembl gencode identifier conversion • 2.5k views
ADD COMMENT
2
Entering edit mode
5.3 years ago
Emily 24k

The stable gene IDs (ENSGs) should refer to the same entity between releases, and are far more stable than the gene names, which change more frequently. I would recommend using the current BioMart to convert your list of gene names from the current release (GENCODE 31 = Ensembl 97 = latest data) to ENSGs, then you can reasonably compare these to your data from Ensembl 86.

ADD COMMENT
0
Entering edit mode

Thank you, that's helpful. Is there an easy way to identify the best matching BioMart version when dealing with GENCODE annotations (if I had files annotated with GENCODE version 29 or 30 for example)?

ADD REPLY
1
Entering edit mode

You can see the GENCODE version on the species information page. Go to the archives (bottom right) to see previous versions. It's also 1:1 so if you know one, you can just count back to find all the rest.

ADD REPLY
0
Entering edit mode

Super, just what I needed!

ADD REPLY

Login before adding your answer.

Traffic: 1191 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6