topGO R package - how to account for fold expression and p value
1
0
Entering edit mode
3 months ago
Sven • 0

Dear all,

I am relatively new to the entire world of gene ontology, but want to perform a GO analysis using the R package topGO. However, I have some questions that I wanted to ask and to which I could not find an answer.My proteomics data contains p values, fold expression and number of unique peptides.

1) I could not manage to find a built-in list (gene2go parameter) that connects my uniprotids with the GO terms for rainbow trout. Thus, I simply downloaded such a list from uniprot and imported it to R. Is this approach fine?

2) For my analysis I would like to include all three values (p values, fold expression and number of unique peptides). Because my p value threshold is 0.01, I construct my selection function like this: selection <- function(allScore){ return(allScore < 0.01)} and use the fisher statistic with the weight01 algorithm. Additionally, I would like to only use proteins that show a fold expression of more than 1.5 or less than 1/1.5 and optimally exclude proteins with less than two unique peptides. Can I trim my data before the analysis and exclude proteins based on fold change and number of unique peptides? I am afraid that this could interfere with the statistical analysis, since it would make my gene universe smaller. Alternatively, I could manually change their p value to one and only use the selection function based on the p value.

Thank you very much in advance!

R proteomics topGO enrichment • 436 views
ADD COMMENT
1
Entering edit mode
3 months ago
dthorbur ★ 2.5k

1) I could not manage to find a built-in list (gene2go parameter) that connects my uniprotids with the GO terms for rainbow trout. Thus, I simply downloaded such a list from uniprot and imported it to R. Is this approach fine?

Unless it's changed since I last used it, topGO require bioconductor annotation packages such as the org.Hs.eg.db for humans or org.Mm.eg.db for mouse, which should have all the mappings for GO annotations. There are ways of creating custom annotation packages, but I never got it to work when I tried a few years ago (though I didn't try all that hard).

2) For my analysis I would like to include all three values (p values, fold expression and number of unique peptides).

To me, it sounds like you're simply subsetting your dataset, as is normal for enrichment analyses. You generally shouldn't be changing the universe/background to test against unless you have a specific research question that merits such a change.

If there is no package for your model system on bioconductor, you have a few options: 1) create your own annotation package to use with topGO, 2) assign orthologs and use a package of a similar species that does have an annotations package (i.e., zebrafish), or 3) use a tool that allows easy custom universe settings like g:Profiler2 - tutorial here.

As an aside, you can subset data in R using the subset function. You don't need to write your own function.

ADD COMMENT
0
Entering edit mode

Thank you very much for your answer.

1) I transformed a dataframe with the UniprotIDs in one column and the GO numbers in the other to a named list that has the same format as in the quick manual.

2) Ok, so as long as I do not subset my gene2GO parameter, I am fine?

ADD REPLY

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6