Question

topGO R package - how to account for fold expression and p value

0

Entering edit mode

11 months ago

Sven • 0

Dear all,

I am relatively new to the entire world of gene ontology, but want to perform a GO analysis using the R package topGO. However, I have some questions that I wanted to ask and to which I could not find an answer.My proteomics data contains p values, fold expression and number of unique peptides.

1) I could not manage to find a built-in list (gene2go parameter) that connects my uniprotids with the GO terms for rainbow trout. Thus, I simply downloaded such a list from uniprot and imported it to R. Is this approach fine?

2) For my analysis I would like to include all three values (p values, fold expression and number of unique peptides). Because my p value threshold is 0.01, I construct my selection function like this: selection <- function(allScore){ return(allScore < 0.01)} and use the fisher statistic with the weight01 algorithm. Additionally, I would like to only use proteins that show a fold expression of more than 1.5 or less than 1/1.5 and optimally exclude proteins with less than two unique peptides. Can I trim my data before the analysis and exclude proteins based on fold change and number of unique peptides? I am afraid that this could interfere with the statistical analysis, since it would make my gene universe smaller. Alternatively, I could manually change their p value to one and only use the selection function based on the p value.

Thank you very much in advance!

R proteomics topGO enrichment • 1.0k views

ADD COMMENT • link 11 months ago by Sven • 0

score 1 · Answer 1 · 2024-08-14

1) I could not manage to find a built-in list (gene2go parameter) that connects my uniprotids with the GO terms for rainbow trout. Thus, I simply downloaded such a list from uniprot and imported it to R. Is this approach fine?

Unless it's changed since I last used it, topGO require bioconductor annotation packages such as the org.Hs.eg.db for humans or org.Mm.eg.db for mouse, which should have all the mappings for GO annotations. There are ways of creating custom annotation packages, but I never got it to work when I tried a few years ago (though I didn't try all that hard).

2) For my analysis I would like to include all three values (p values, fold expression and number of unique peptides).

To me, it sounds like you're simply subsetting your dataset, as is normal for enrichment analyses. You generally shouldn't be changing the universe/background to test against unless you have a specific research question that merits such a change.

If there is no package for your model system on bioconductor, you have a few options: 1) create your own annotation package to use with topGO, 2) assign orthologs and use a package of a similar species that does have an annotations package (i.e., zebrafish), or 3) use a tool that allows easy custom universe settings like g:Profiler2 - tutorial here.

As an aside, you can subset data in R using the subset function. You don't need to write your own function.