Question

Selecting the background for GO enrichment analysis (proteomics)

1

Entering edit mode

5.7 years ago

Sebastian Hesse ▴ 350

Im using clusterProfiler for GO enrichment of differentially expressed proteins from different groups. It appears that most suitable is to use as a background for enrichment all proteins that "potentially could be in the lift of the differentially expressed proteins".

My issue is that I am not 100% sure what this means as I go though various rounds of selecting proteins before differential expression analysis. Eg I select only proteins quantified with >1 peptide.

Would you set the background for enrichment using really only the full list of proteins used in diff ex analysis (so the most stringently filtered ones), the list of proteins before filterin (so including proteons quantified with 1 peptide) or maybe even another list, eg the complete human proteome or the cells proteome as published by others before me?

Thanks a lot! Sebastian

ps: so far I am using the highly filtered list that includes only proteins used for diff ex analysis

GO enrichment proteomics clusterProfiler • 6.4k views

ADD COMMENT • link updated 4.7 years ago by janetscully • 0 • written 5.7 years ago by Sebastian Hesse ▴ 350

0

Entering edit mode

Oh my God!!! I've been doing enrichment with the whole proteome of Arabidopsis (I work with plants). Just to be sure: when it comes about annotation of gene IDs you DO use the whole genome info, right? (I am in a learning-by-doing process)

Thanks a lot !!

Janet

ADD REPLY • link 4.7 years ago by janetscully • 0

0

Entering edit mode

What you're asking is unclear. If you're asking about how to define the background/reference list for enrichment analysis, see my answer below. Otherwise, consider opening a new question.

ADD REPLY • link 4.7 years ago by Jean-Karim Heriche 27k

score 0 · Answer 1 · 2019-04-26

0

Entering edit mode

5.7 years ago

Jean-Karim Heriche 27k

The background should be the universe of all genes interrogated in the experiment, i.e. all genes that had a chance of making it into the list of genes of interest (e.g. differentially expressed ones). For example if your experiment only tested expression levels of transcription factors, then your background list would be limited to the transcription factors you tested.

ADD COMMENT • link 5.7 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

This doesn't make sense to me since we don't know the universe of TF. How could you filter your dataset( background) by categories for which we have limited knowledge?

ADD REPLY • link 4.7 years ago by ATCG ▴ 400

2

Entering edit mode

The background is not the list of things of which you have knowledge, it's the list of what you tested/experimented on. If the experiment measured expression of 100 TFs then the background list for the purpose of enrichment analysis is the list of the 100 TFs you tested. It is impossible that any other TFs could be identified as interesting since they were not tested. Including them in the background creates a bias in the analysis.

EDIT: To try and clarify further, if your proteomics experiment started from the whole sample and your mass-spectrometer was set to identify all peptides then any protein in the sample had a chance to be detected. The filtering is applied as one of the steps to identify the proteins of interest.

ADD REPLY • link 4.7 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I see. Thanks for clarifying. How would you do this for an RNA-Seq experiment where you survey the whole transcriptome?

ADD REPLY • link 4.7 years ago by ATCG ▴ 400

1

Entering edit mode

In this case, every transcript present in your sample has a chance of being detected (if no particular selection is applied to the RNAs such as polyA enrichment) so the background list should contain all the corresponding genes. Often we don't know what genes are actually expressed in a particular sample so we just take all the genes in the genome but sometimes we know to some extent which genes are not expressed in which case it would make sense to not include them.

ADD REPLY • link 4.7 years ago by Jean-Karim Heriche 27k