Trying to map informations between two Mus musculus strains with biomaRt external_gene_name field
1
0
Entering edit mode
7.0 years ago
erwan.scaon ▴ 950

Dear all,

I am trying to "map" Mus musculus infos between two strains (C57BL/6J & 129S1/SvImJ) based on biomaRt "external_gene_name" field.

Using :
Bioconductor 3.6 (BiocInstaller 1.28.0), R 3.4.2 (2017-09-28)
biomaRt ‘2.34.0’

Commands :
Retrieve C57BL/6J infos :

b6 = useMart("ENSEMBL_MART_ENSEMBL",
             dataset="mmusculus_gene_ensembl")

b6_infos <- getBM(attributes=c('ensembl_gene_id',
                               'external_gene_name'),
                  filters = 'chromosome_name',
                  values = '1',
                  mart = b6)

Retrieve 129S1/SvImJ infos :

sv = useMart("ENSEMBL_MART_MOUSE",
             dataset="m129s1svimj_gene_ensembl")

sv_infos <- getBM(attributes=c('ensembl_gene_id',
                               'external_gene_name'),
                  filters = 'chromosome_name',
                  values = '1',
                  mart = sv)

Show results :

head(b6_infos)
ensembl_gene_id external_gene_name
1 ENSMUSG00000102693 4933401J01Rik
2 ENSMUSG00000064842 Gm26206
3 ENSMUSG00000051951 Xkr4
4 ENSMUSG00000102851 Gm18956
5 ENSMUSG00000103377 Gm37180
6 ENSMUSG00000104017 Gm37363


head(sv_infos)
ensembl_gene_id external_gene_name
1 MGP_129S1SvImJ_G0000317
2 MGP_129S1SvImJ_G0037123 Gm26206
3 MGP_129S1SvImJ_G0015807 Xkr4
4 MGP_129S1SvImJ_G0009955
5 MGP_129S1SvImJ_G0000318
6 MGP_129S1SvImJ_G0000319

Now let's pick a line with an empty "external_gene_name" in sv_infos, for example the one with ensembl_gene_id = MGP_129S1SvImJ_G0009955.
If we "manually" check on the Ensembl website, MGP_129S1SvImJ_G0009955 is known to have a "Reference strain equivalent" = Gm18956 (which would allow me to "map" things between the two strains). We can see that this external_gene_name (Gm18956) is in the C57BL/6J table, with corresponding ensembl_gene_id = ENSMUSG00000102851 (lines in bold).

What am I missing ?
Shouldn't MGP_129S1SvImJ_G0009955 have external_gene_name = Gm18956 in the 129S1/SvImJ dataset ?

biomaRt 129S1/SvImJ external_gene_name ensembl • 2.5k views
ADD COMMENT
0
Entering edit mode
7.0 years ago
Mike Smith ★ 2.1k

The page you link to also shows a blank field for 'Name' in the transcript table. Given this, it looks like it's an annotation choice/database issue, rather than something that's wrong with your biomaRt query.

It doesn't answer your question as to why the link between those two genes isn't present in the database, but if you want to simplify your query to only find results where a homologous relationship is recorded, you can use the getLDS() function directly e.g.

b6 = useMart("ENSEMBL_MART_ENSEMBL",
             dataset="mmusculus_gene_ensembl")

sv = useMart("ENSEMBL_MART_MOUSE",
             dataset="m129s1svimj_gene_ensembl")

paired_genes <- getLDS(attributes = c('ensembl_gene_id',
                                      'external_gene_name'),
                       filters = 'chromosome_name',
                       values = '1',
                       mart = b6,
                       attributesL = c('ensembl_gene_id',
                                       'external_gene_name'),
                       filtersL = 'chromosome_name',
                       valuesL = '1',
                       martL = sv)

Here's an example of the output:

> head(paired_genes)
      Gene.stable.ID     Gene.name        Gene.stable.ID.1   Gene.name.1
1 ENSMUSG00000089358       Gm25491 MGP_129S1SvImJ_G0009375       Gm25491
2 ENSMUSG00000067879 3110035E14Rik MGP_129S1SvImJ_G0015825 3110035E14Rik
3 ENSMUSG00000079658          Eloc MGP_129S1SvImJ_G0015857         Tceb1
4 ENSMUSG00000098234         Snhg6 MGP_129S1SvImJ_G0004772         Snhg6
5 ENSMUSG00000045210        Vcpip1 MGP_129S1SvImJ_G0015827        Vcpip1
6 ENSMUSG00000099032         Tcf24 MGP_129S1SvImJ_G0015830         Tcf24
ADD COMMENT

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6