Ensembl mysql database - data discrepancy versus the web page of gene FGFR2
1
0
Entering edit mode
2.6 years ago
rawi • 0

Hello everybody

I installed locally the homo_sapiens_*_106_37 databases.

Hoping I have a little understanding about the table structure and relations in homo_sapiens_core I have run the simple sql:

SELECT 
    transcript.stable_id AS transcript_stable_id,
    transcript_display.display_label AS transcript_display_label,
    CASE WHEN gene.canonical_transcript_id = transcript.transcript_id THEN 'YES' ELSE '' END AS canonical
FROM
    gene
    INNER JOIN transcript USING (gene_id)
    INNER JOIN xref AS transcript_display ON transcript_display.xref_id=transcript.display_xref_id
WHERE 
    gene.stable_id = 'ENSG00000066468'
ORDER BY 
    transcript_display.display_label;

What I get is:

transcript_stable_id    transcript_display_label    canonical
ENST00000358487 FGFR2-001   
ENST00000336553 FGFR2-003   
ENST00000360144 FGFR2-004   
ENST00000369060 FGFR2-005   
ENST00000369056 FGFR2-006   
ENST00000369059 FGFR2-007   
ENST00000478859 FGFR2-008   
ENST00000346997 FGFR2-009   
ENST00000457416 FGFR2-010   YES
ENST00000369058 FGFR2-011   
ENST00000356226 FGFR2-012   
ENST00000491111 FGFR2-013   
ENST00000490349 FGFR2-015   
ENST00000359354 FGFR2-016   
ENST00000467584 FGFR2-020   
ENST00000429361 FGFR2-021   
ENST00000491475 FGFR2-022   
ENST00000463870 FGFR2-023   
ENST00000604236 FGFR2-024   
ENST00000351936 FGFR2-201   
ENST00000357555 FGFR2-202   
ENST00000369061 FGFR2-203   

Now going to http://www.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000066468;r=10:121478332-121598458 ... I learn that the canonical transcript is "FGFR2-206", not like my result "FGFR2-010"

I didn't find the display label "FGFR2-206" in the whole xref.

And I get fewer transcripts than listed on the WEB Page.

Please can someone tell me, where are the other coming from and why the database canonical transcript is not the WEB-page named one?

Thanks a lot!

Rawi

database mysql ensembl • 826 views
ADD COMMENT
1
Entering edit mode

When I try to do a similar request for a canonical transcript using biomaRt:

library(biomaRt)

#declaring the database to use here
ensembl = useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")

#requesting for canonical transcript
getBM(attributes = c("transcript_is_canonical", "ensembl_gene_id", "ensembl_transcript_id"), filters = "ensembl_gene_id", values = "ENSG00000066468", ensembl)

I get the following output (with the canonical transcript similar to the Ensembl webpage for the gene and similar number of transcripts):

 transcript_is_canonical ensembl_gene_id ensembl_transcript_id
1                        1 ENSG00000066468       ENST00000358487
2                       NA ENSG00000066468       ENST00000369061
3                       NA ENSG00000066468       ENST00000357555
4                       NA ENSG00000066468       ENST00000613048
5                       NA ENSG00000066468       ENST00000684516
6                       NA ENSG00000066468       ENST00000682904
7                       NA ENSG00000066468       ENST00000682296
8                       NA ENSG00000066468       ENST00000638709
9                       NA ENSG00000066468       ENST00000682772
10                      NA ENSG00000066468       ENST00000683418
11                      NA ENSG00000066468       ENST00000351936
12                      NA ENSG00000066468       ENST00000682550
13                      NA ENSG00000066468       ENST00000683211
14                      NA ENSG00000066468       ENST00000683029
15                      NA ENSG00000066468       ENST00000683250
16                      NA ENSG00000066468       ENST00000478859
17                      NA ENSG00000066468       ENST00000684153
18                      NA ENSG00000066468       ENST00000356226
19                      NA ENSG00000066468       ENST00000369060
20                      NA ENSG00000066468       ENST00000604236
21                      NA ENSG00000066468       ENST00000369059
22                      NA ENSG00000066468       ENST00000467584
23                      NA ENSG00000066468       ENST00000429361
24                      NA ENSG00000066468       ENST00000346997
25                      NA ENSG00000066468       ENST00000457416
26                      NA ENSG00000066468       ENST00000360144
27                      NA ENSG00000066468       ENST00000369056
28                      NA ENSG00000066468       ENST00000683885
29                      NA ENSG00000066468       ENST00000369058
30                      NA ENSG00000066468       ENST00000336553
31                      NA ENSG00000066468       ENST00000463870
32                      NA ENSG00000066468       ENST00000683678
33                      NA ENSG00000066468       ENST00000682400
34                      NA ENSG00000066468       ENST00000490349
35                      NA ENSG00000066468       ENST00000359354
36                      NA ENSG00000066468       ENST00000491475
37                      NA ENSG00000066468       ENST00000613324
38                      NA ENSG00000066468       ENST00000611527
39                      NA ENSG00000066468       ENST00000636922
40                      NA ENSG00000066468       ENST00000683035
41                      NA ENSG00000066468       ENST00000491111

I have never used locally installed databases, but is it possible you are using a different version of the database (possibly an older version)?

ADD REPLY
0
Entering edit mode

Thanks manaswwm, it was indeed the version of the database, as pointed out by GenoMax... and me looking at the wrong Ensembl Site

ADD REPLY
2
Entering edit mode
2.6 years ago
GenoMax 147k

I installed locally the homo_sapiens_*_106_37 databases.

You installed older GRCh37 human genome locally (looking at 37 in name), where as the latest data available via web page is for GRCh38 current genome release.

Here is the corresponding web page for GRCh37: http://grch37.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000066468;r=10:123237848-123357972

You can try installing the GRCh38 database (will have 38 in name) and see what you get.

ADD COMMENT
0
Entering edit mode

GenoMax, thank you very much for opening my eyes. I didn't pay attention, that I wasn't on the GRCh37 Site.

Despite transcripts on the GRCh37 WEB page not having a Flag "canonical", the first listed transcript there is FGFR2-001 which translates to the same 821aa protein like the canonical FGFR2-206 on the GRCh38 WEB page.

The FGFR2-010 transcript I found as "canonical" in the GRCh37 database translates to 822aa but is somewhere down the list.

And yes - the same query of the GRCh38 database is consistent with the GRCh38 WEB page

Many Thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 2664 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6