Hi everyone,
I've been looking at gene expression patterns in one dataset (dataset A) for which I have Ensembl gene identifiers obtained using the annotation file corresponding to Ensembl version 86 (GRCh38.p16). I now want to look at the patterns in another dataset (dataset B) for which the genes are identified by gene symbols (annotated using Gencode release 31 GRCh38.p12). I would like to match the identifiers from both datasets and considered several strategies, however, I'm struggling to find which solution would be the best:
1/ Convert Ensembl gene identifiers from dataset A to gene symbols using biomaRt and the version corresponding to Gencode release 31
Possible issues: I'm unsure which version of Ensembl would be the best to do that, will it be the first Ensembl version with patch 12 (that would be version 92 from April 2018)? Or is this more complicated than that?
2/ Convert gene symbols from dataset B to Ensembl gene identifiers using biomaRt and the Ensembl version corresponding to Gencode release 31 (used for the annotation of dataset B)
Possible issues: As previously, I'm not sure which Ensembl version to use and also I'm not entirely sure Ensembl gene identifiers will refer to the exact same things between the version?
3/ Convert gene symbols from dataset B to Ensembl gene identifiers using biomaRt and the version corresponding to Ensembl version 86 (used for the annotation of dataset A)
Possible issues: Pretty much the same ones as 2/
4/ Re-annotate one of the two sets so that they are both annotated with the same file
Possible issues: I don't have access to the raw data...
I'm also thinking that I could be misunderstanding the 'stability' part of Ensembl gene identifiers or possibly overcomplicating the whole thing...
In any case, I'd appreciate some suggestions/explanations to help clear the fog away from my mind...
Thank you!
Thank you, that's helpful. Is there an easy way to identify the best matching BioMart version when dealing with GENCODE annotations (if I had files annotated with GENCODE version 29 or 30 for example)?
You can see the GENCODE version on the species information page. Go to the archives (bottom right) to see previous versions. It's also 1:1 so if you know one, you can just count back to find all the rest.
Super, just what I needed!