Question

Advice needed for GO enrichment

1

Entering edit mode

9.0 years ago

dankwc2000 ▴ 20

Hello,

I want to use GO enrichment (http://geneontology.org/) for the first time with my E.coli proteomic hits. My IDs are from Uniprot and I have 211 of these. After submitting my 211 IDs, only 4 are mapped.

The 4 IDs that mapped successfully are:

P39356
P0A8A0
P0AAB8
P0AAI3

Here is a sample list of IDs that didn't match:

P0A954
A0A061KCV1
P0AGK6
A0A066SYL5

I have also tried to convert the Uniprot IDs to Uniprot Gene names and then searching it on GO, which has given me an improved number of mapped IDs to 129 out of 211.

A sample of hits which was mapped successfully includes:

ampC
accA
yeaG
atpF

Here are a sample of Gene names which was not mapped successfully:

bla
groL1
hlyA
AC789_1c11180
BY96_12590

If someone can shed some light on why it is not mapping successfully that will be great. Thanks.

The list of IDs I want submitting are located here

Proteomics • 2.0k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.0 years ago by dankwc2000 ▴ 20

3

Entering edit mode

Please move the IDs to a GitHub gist. Pasting a list here only serves to drive people away from the question - it is excessive information not necessary to the question.

A better approach would be to provide 2 lists - the IDs that map, and a small sample (maybe 5 items) among the IDs that don't map. That way, you can think on attributes common among members of each group and different between the groups themselves and figure out the root cause.

ADD REPLY • link 5.0 years ago by Ram 44k

1

Entering edit mode

Hi,

there is a lot different databases serving the GO enrichment, like DAVID, Panther, GraphiteWeb, etc, so you can try there. I would also suggest to convert your list from proteins to gene symbols (Entrez) which encode those proteins.

From my experience I can also tell you that sometimes you have very long list of valid IDs, but there is too small statistics to map something correctly.

Best regards!

ADD REPLY • link updated 5.0 years ago by Ram 44k • written 9.0 years ago by orzech_mag ▴ 230

Ram · Accepted Answer · 2016-01-07

If the genome of your E.coli strain is publicly available you can get the GO annotation from QuickGO. Basically download the annotation of your stain and then extract the genes in your list.

Otherwise you could try to annotate your genes with Blast2GO or run them with Interproscan (there is also a stand alone version for the last one).

For making the enrichment there are many nice tools. I personally like the one in Bioconductor R, the most (e.g. topGO).

You can get an idea checking previous posts: