Question

Choose a good GO analysis

0

Entering edit mode

8.2 years ago

benoahb ▴ 40

I have a few sets of ±100 proteins identified by MS as up/down-regulated under a few conditions. I would like to offer a generalist overview of functions and such.... My go-to would be a very generalist pie chart (biological process and maybe molecular function) and a short precise description (pathway/protein class) to add in a table.

So, for what I've seen, the best options are PANTHER or DAVID. How to choose? My main interest is in BP/pathways but I also consider MF/protein class

On PANTHER,

when I look at the list, there are proteins missing a "protein class", other a "biological process", sometimes PC gives a more accurate description, sometimes it's BP.
when I look at the pie of BP, "immune response" and "response to stimulus" are separated while it ends up being the "same thing" (for my set of proteins), but on the side we have a "cellular process" which is way too generalist.
And, to cite an example I stumbled on, OAS's PC is "defense/immunity protein" and BP is "response to stimulus" (http://pantherdb.org/genes/gene.do?acc=HUMAN|HGNC=8088|UniProtKB=Q9Y6K5). As far as I know, OAS should be part of the BP "immune response".

On DAVID, I get something somewhat similar to PANTHER "statistical surrepresentation".

if I choose GO > BP direct, i have only a list of function that are somewhat redundant, based on the same core of proteins
if I choose GO > BP all, I end up with an endless list, too precise, again quite redundant... which is just not "showable" on a pie chart.

From here, where to go?

pick a bunch of accurate functions/pathways which are statistically identified allowing a mild overlap of functions to be representative of my sample, but not showing every variations of a few overlapping pathways
and discard the under-representated functions/pathways (under a threshold of pvalue / number of protein involved) into a generalist "diverse" tag

But then, it ends up being all manipulated by the user. As objective as my choices can be... Where's the science in that...

GO • 4.2k views

ADD COMMENT • link 8.2 years ago by benoahb ▴ 40

1

Entering edit mode

Have you tried ClueGO from Cytoscape?

ADD REPLY • link 8.2 years ago by Lila M ★ 1.3k

0

Entering edit mode

Still waiting my "couple of days" to get a licence :)

But the preview seem quite interesting, thanks !

ADD REPLY • link 8.2 years ago by benoahb ▴ 40

1

Entering edit mode

But then, it ends up being all manipulated by the user. As objective as my choices can be... Where's the science in that...

Very true. Worse, there are many more tools who do kinda the same, but just a bit different. Or use a different database.
In the end people will just fish out the result that best confirms their initial hypothesis.

But I cannot imagine that this GO analysis is an endpoint of your work. What would be the next step?

ADD REPLY • link 8.2 years ago by WouterDeCoster 48k

0

Entering edit mode

Sadly, it doesn't go much much further than that. Due to money issue we can't have wet lab follow up.

The endpoint is a comparison of the different strong fold change hits, functional pie charts and networks builds in our different conditions. Hence I don't pick the 1st functional pie chart I see and actually look a bit more into it than usual...

ADD REPLY • link 8.2 years ago by benoahb ▴ 40

score 1 · Answer 1 · 2017-03-16

1

Entering edit mode

8.2 years ago

Jean-Karim Heriche 27k

If you're interested in summarizing your gene list with GO terms, you could select a relevant set of terms of the appropriate specificity (e.g. cell cycle, protein secretion ... and add an "other processes" category) then find out which genes are annotated with these terms or their children in GO. If you want a pie chart, then you need non-overlapping counts. For this, you can order the terms by importance/relevance to your study and count a gene only in the first category it falls in.

ADD COMMENT • link 8.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Ok, that sounds like a good way to go, thank you for your input!

Are you suggesting the use of a tool I am not aware of? ... or to get my hands dirty and do it one by one?

Sorry, I'm still learning how this whole thing works....

ADD REPLY • link 8.2 years ago by benoahb ▴ 40

0

Entering edit mode

The term selection is manual in so far that you hand select appropriate terms. Then I would write a script to collect the annotations given a gene list as input. The main difficulty resides in navigating the ontology e.g. if you picked cell cycle as a relevant category and you have a gene annotated as involved in the G2/M transition, you want to be able to say that this is a child term of the cell cycle term. I would suggest using the R Bioconductor GO.db package for this.

ADD REPLY • link 8.2 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Ok. Then the issue is that I haven't had the chance to learn how to use R yet and I have a limited time.

Is there any way around R? Beside using the pie chart obtained with PANTHER?

ADD REPLY • link 8.2 years ago by benoahb ▴ 40

score 0 · Answer 2 · 2017-03-23

0

Entering edit mode

8.2 years ago

benoahb ▴ 40

Ok, so I found what I was looking for in another thread: DAVID's clustering function !

ADD COMMENT • link 8.2 years ago by benoahb ▴ 40