Dear all,
I am trying to "map" Mus musculus infos between two strains (C57BL/6J & 129S1/SvImJ) based on biomaRt "external_gene_name" field.
Using :
Bioconductor 3.6 (BiocInstaller 1.28.0), R 3.4.2 (2017-09-28)
biomaRt ‘2.34.0’
Commands :
Retrieve C57BL/6J infos :
b6 = useMart("ENSEMBL_MART_ENSEMBL",
dataset="mmusculus_gene_ensembl")
b6_infos <- getBM(attributes=c('ensembl_gene_id',
'external_gene_name'),
filters = 'chromosome_name',
values = '1',
mart = b6)
Retrieve 129S1/SvImJ infos :
sv = useMart("ENSEMBL_MART_MOUSE",
dataset="m129s1svimj_gene_ensembl")
sv_infos <- getBM(attributes=c('ensembl_gene_id',
'external_gene_name'),
filters = 'chromosome_name',
values = '1',
mart = sv)
Show results :
head(b6_infos)
ensembl_gene_id external_gene_name
1 ENSMUSG00000102693 4933401J01Rik
2 ENSMUSG00000064842 Gm26206
3 ENSMUSG00000051951 Xkr4
4 ENSMUSG00000102851 Gm18956
5 ENSMUSG00000103377 Gm37180
6 ENSMUSG00000104017 Gm37363
head(sv_infos)
ensembl_gene_id external_gene_name
1 MGP_129S1SvImJ_G0000317
2 MGP_129S1SvImJ_G0037123 Gm26206
3 MGP_129S1SvImJ_G0015807 Xkr4
4 MGP_129S1SvImJ_G0009955
5 MGP_129S1SvImJ_G0000318
6 MGP_129S1SvImJ_G0000319
Now let's pick a line with an empty "external_gene_name" in sv_infos, for example the one with ensembl_gene_id = MGP_129S1SvImJ_G0009955.
If we "manually" check on the Ensembl website, MGP_129S1SvImJ_G0009955 is known to have a "Reference strain equivalent" = Gm18956 (which would allow me to "map" things between the two strains). We can see that this external_gene_name (Gm18956) is in the C57BL/6J table, with corresponding ensembl_gene_id = ENSMUSG00000102851 (lines in bold).
What am I missing ?
Shouldn't MGP_129S1SvImJ_G0009955 have external_gene_name = Gm18956 in the 129S1/SvImJ dataset ?