go term analysis with ensembl gene id
3
0
Entering edit mode
7.8 years ago
yuxinghai ▴ 10

I get some ensembl gene id after gene different expression analysis with DEseq2. I want to perform GO enrichment analysis, but almost half of them can't be recognized by DAVID. some people said I could use biomart in ensembl to get corresponding GO term of each gene, but what should I next do?

RNA-Seq gene go ensembl • 8.6k views
ADD COMMENT
1
Entering edit mode

Give GeneSCF a try. It supports Ensembl ID's.

ADD REPLY
0
Entering edit mode

Sorry to say this. GeneSCF does not support Ensembl IDs directly. But you can convert into Gene Symbols and Entrez ids and use it in GeneSCF.

ADD REPLY
0
Entering edit mode

It's a pity that it doesn't work with EnsEMBL. In my work I find EnsEMBL a much better resource than NCBI.

ADD REPLY
0
Entering edit mode

It was problem when I try to implement Ensembl with GeneSCF. Because for some of the GeneSymbols the Ensembl ID (ENSG) is varying depending on the version of Ensembl.

Example, for KCNQ1OT1, I can see different ENSG-ID in old Ensembl (ENSG00000258492.1, GRCh37.66, gencode v11) and new Ensembl (ENSG00000269821.1, GRCh37.74-75, gencode v19). Only thing constant here was Gene Symbol or Entrez ID for this gene.

Atleast if I have something constant (fixed) like Gene Symbols (I can easily deal with multiple alias) or Entrez IDs, I can use it confidently (Otherwise, this might mislead).

ADD REPLY
0
Entering edit mode

Don't use the .x version number of EnsEMBL IDs, they should be more stable this way. Gene symbols are also not stable (although I must say they change less often than they used to a few years ago). Also the whole problem is to define what a gene is and work with this definition in a consistent way. It seems that for you a gene is defined by whatever share the same symbol. This is reasonable as this is more or less the definition used by biologists but as you've already experienced, it can create computational problems. It is also not always the best definition to use, especially when the underlying genome matters. The problem with Entrez is that it is unclear what a gene is. From this paper:

A GeneID is usually assigned to what is annotated as a gene on a RefSeq record. ... A GeneID may also be assigned when no RefSeq exists.

And from the RefSeq book section on curation:

A sequence record unambiguously associated with a Gene record may be propagated into a RefSeq record.

This looks very circular and ad hoc to me.

A RefSeq record is suppressed if it is found to represent a transcribed repeat element, ... or not to represent a "gene".

Notice the quote around the word gene, which I take to indicate there's no formal definition of the term.

Anyway, the conclusion is that there are different definitions of what a gene is and that one should pick a reference and stick to it for the duration of a project or risk inconsistent results.

ADD REPLY
2
Entering edit mode
7.8 years ago

You could use an R package like topGO or one of the Babelomics enrichment tools.

ADD COMMENT
1
Entering edit mode
7.8 years ago
EagleEye 7.6k

Suggestion:

1) Using BioMart convert your Ensembl (ENSG) Ids into Gene Symbols or Entrez GeneIDs (check steps here).

2) Use GeneSCF to do enrichment analysis.

ADD COMMENT
0
Entering edit mode

but many ensemble gene id don't have corresponding Entrez ids.

ADD REPLY
0
Entering edit mode

All Ensembl IDs will have corresponding GeneSymbols. You can use that information.

ADD REPLY
1
Entering edit mode
7.8 years ago
Benn 8.4k

With goseq in R you can use ensemble IDs.

Or clusterProfiler, which has a good tutorial:

http://bioconductor.org/packages/release/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html

ADD COMMENT

Login before adding your answer.

Traffic: 1599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6