Dear all,
I am relatively new to the entire world of gene ontology, but want to perform a GO analysis using the R package topGO. However, I have some questions that I wanted to ask and to which I could not find an answer.My proteomics data contains p values, fold expression and number of unique peptides.
1) I could not manage to find a built-in list (gene2go parameter) that connects my uniprotids with the GO terms for rainbow trout. Thus, I simply downloaded such a list from uniprot and imported it to R. Is this approach fine?
2) For my analysis I would like to include all three values (p values, fold expression and number of unique peptides). Because my p value threshold is 0.01, I construct my selection function like this: selection <- function(allScore){ return(allScore < 0.01)} and use the fisher statistic with the weight01 algorithm. Additionally, I would like to only use proteins that show a fold expression of more than 1.5 or less than 1/1.5 and optimally exclude proteins with less than two unique peptides. Can I trim my data before the analysis and exclude proteins based on fold change and number of unique peptides? I am afraid that this could interfere with the statistical analysis, since it would make my gene universe smaller. Alternatively, I could manually change their p value to one and only use the selection function based on the p value.
Thank you very much in advance!
Thank you very much for your answer.
1) I transformed a dataframe with the UniprotIDs in one column and the GO numbers in the other to a named list that has the same format as in the quick manual.
2) Ok, so as long as I do not subset my gene2GO parameter, I am fine?