Advice needed for GO enrichment

Entering edit mode

9.4 years ago

dankwc2000 ▴ 20

Hello,

I want to use GO enrichment (http://geneontology.org/) for the first time with my E.coli proteomic hits. My IDs are from Uniprot and I have 211 of these. After submitting my 211 IDs, only 4 are mapped.

The 4 IDs that mapped successfully are:

P39356
P0A8A0
P0AAB8
P0AAI3

Here is a sample list of IDs that didn't match:

P0A954
A0A061KCV1
P0AGK6
A0A066SYL5

I have also tried to convert the Uniprot IDs to Uniprot Gene names and then searching it on GO, which has given me an improved number of mapped IDs to 129 out of 211.

A sample of hits which was mapped successfully includes:

ampC
accA
yeaG
atpF

Here are a sample of Gene names which was not mapped successfully:

bla
groL1
hlyA
AC789_1c11180
BY96_12590

If someone can shed some light on why it is not mapping successfully that will be great. Thanks.

The list of IDs I want submitting are located here

	A0A0E2TM38
	P0AED2
	A0A061L549
	A0A066SXV7
	P0A6B3
	A0A024L7L1
	A0A066T8X6
	B7LIW9
	A0A025FZ31
	A0A0A1A6C7
	A0A066T289
	A0A023Z5Q1
	P0A4U5
	A0A0A0GRX6
	A0A066T2U8
	A0A066T2K7
	B7MI03
	A0A061L2L2
	B7NCN8
	A0A066SYL5
	A0A0D8VW66
	A7ZSC7
	A0A0A1AAF9
	P0AAB8
	A0A061YGI3
	A0A0E0V5W3
	A0A061YFX7
	A0A026GYM0
	C8CGJ0
	A0A023L2I8
	A0A028AHG2
	B7UIL1
	A0A0B1F2D6
	A0A066T1C1
	P0AB82
	C3TIN2
	B7MV94
	A0A0D8W5Y7
	A0A066SU35
	A0A066SPP0
	A0A0A1A0U0
	A0A061YKW8
	A0A066R9J0
	E2XHV6
	B7MKM9
	A0A027TKW3
	P0AAI3
	P39356
	B7UI43
	C6EFG9
	A0A028ED46
	A0A0E0V778
	A0A061YA81
	A0A0C8R7N1
	A0A066T0N8
	A0A0A2RRQ4
	A0A066RHH7
	A0A061L049
	A0A0C5EZJ5
	A0A061L5F7
	A0A023YSY5
	P0A954
	P0DMC8
	A0A0A0FY80
	A0A066SZS3
	A0A061YP03
	B1LJ43
	A0A0A1AAR5
	A0A066T149
	A0A061YFJ3
	A0A0A6RYT3
	A7ZU66
	A0A0E2TRU3
	A0A061KX09
	B1VCI2
	A0A061Y7T9
	A7ZUL0
	A7ZTU4
	A0A0E2LNZ8
	K4XHA3
	B7MC90
	C3TJ62
	A7ZTU8
	A0A061KQ46
	A0A066SN12
	P0AAI7
	C3SJ47
	A0A024L4V5
	A0A024L616
	H9URL8
	R6VNF8
	A0A066SZY4
	A7ZTJ2
	P0ABC5
	A0A0E1LC25
	A0A061KAB2
	A0A061KI80
	B3HJ98
	A0A066T1G8
	A0A0D6IKJ0
	A0A077Z3W0
	A8AQJ0
	A0A061KCV1
	B1P7H4
	A1AJ51
	E2QLY1
	A0A066RGX5
	A0A0E1SWP7
	Q1RFA5
	A0A061L7G7
	H4J4S0
	A7ZTU3
	C7S9T0
	A7ZPD1
	A0A0E2TRQ1
	A0A0E2TMC9
	A0A037YGF3
	E2QGQ2
	Q0TKK5
	A0A0E0U2X6
	A0A0A1ADZ6
	B7N0Y3
	B7N2J0
	P13661
	A0A066SWG2
	A0A0E0TX22
	A0A0B0W2Q7
	A0A061KDP5
	A0A061YDR4
	A0A027U8D8
	A0A028AFZ4
	B1LJ51
	Q5MAJ8
	B7N5S1
	A0A066SWC5
	P0ACJ2
	E2QFX6
	A0A066SSX9
	A0A066T686
	A0A024L8V5
	P0A4L6
	A0A0B0XUI7
	P0AEU9
	P0AFL5
	B7MQ57
	A7ZK01
	W9AM67
	A0A066QLF8
	A0A0E1LA67
	A0A0B0VCE8
	A0A0E1LDD5
	A0A0A1A6P7
	B7MQR5
	C3TF32
	Q1R2T5
	Q8VR39
	A0A061L3B1
	A0A0D6IRY9
	A0A0E0VCW7
	A1AB32
	H9UQ82
	R6U580
	A0A066SNK8
	B7UIS4
	C3T5A2
	A0A027U015
	E2XC85
	A0A061YL54
	A0A061YKR0
	A0A061KHZ9
	A0A066SVB9
	A0A066SS32
	A0A0E1SXW8
	A0A061Y578
	A0A0E1T6X3
	A0A061KA72
	A0A066T4W3
	A0A023LIJ5
	A0A066SS33
	P33219
	P0AE93
	E2QQC2
	A0A061KE35
	P0AGK6
	E2QIN3
	A0A066T755
	B7MX29
	A0A075L5G2
	A0A027ZL67
	A0A061YI78
	A7ZS64
	A7ZHR1
	A0A0E1LIN1
	A0A025FQJ4
	A0A066SXJ0
	A0A0F3SJZ5
	A0A061YC46
	A0A061YEH2
	P0A9Z3
	A7ZTR0
	A0A061YH54
	A7ZHS5
	B7N7L4
	B1LIL4
	A0A061KXL2
	A0A066RDY2
	J7Q7B1
	A0A0A6VHK7
	A0A027TTG1
	P0A8A0
	A8ARN6

view raw gistfile1.txt hosted with ❤ by GitHub

Proteomics • 2.2k views

ADD COMMENT • link updated 3.2 years ago by Ram 45k • written 9.4 years ago by dankwc2000 ▴ 20

Entering edit mode

Please move the IDs to a GitHub gist. Pasting a list here only serves to drive people away from the question - it is excessive information not necessary to the question.

A better approach would be to provide 2 lists - the IDs that map, and a small sample (maybe 5 items) among the IDs that don't map. That way, you can think on attributes common among members of each group and different between the groups themselves and figure out the root cause.

ADD REPLY • link 5.5 years ago by Ram 45k

Entering edit mode

Hi,

there is a lot different databases serving the GO enrichment, like DAVID, Panther, GraphiteWeb, etc, so you can try there. I would also suggest to convert your list from proteins to gene symbols (Entrez) which encode those proteins.

From my experience I can also tell you that sometimes you have very long list of valid IDs, but there is too small statistics to map something correctly.

Best regards!

ADD REPLY • link updated 5.5 years ago by Ram 45k • written 9.4 years ago by orzech_mag ▴ 230

Entering edit mode

9.4 years ago

dago ★ 2.8k

If the genome of your E.coli strain is publicly available you can get the GO annotation from QuickGO. Basically download the annotation of your stain and then extract the genes in your list.

Otherwise you could try to annotate your genes with Blast2GO or run them with Interproscan (there is also a stand alone version for the last one).

For making the enrichment there are many nice tools. I personally like the one in Bioconductor R, the most (e.g. topGO).

You can get an idea checking previous posts:

ADD COMMENT • link updated 5.5 years ago by Ram 45k • written 9.4 years ago by dago ★ 2.8k