Hello everyone,
I am working with a non-model fungus and trying to do gene enrichment analysis for differentially expressed genes. For this I got the GO terms for a list of genes using InterProscan. Now I am using these list of genes with respective GO terms to do enrichment analysis and to make plots in R using TopGO package.
I am following thisworkshop from UCdavis to achieve my goal, however I have a small problem with the p-value. I used this code to list the gene with the p-values
tmp <- ifelse(DE$adj.P.Val < pcutoff, 1, 0)
geneList <- tmp
and I am getting the following final result:
GO.ID Term Annotated Significant
1 GO:0008213 protein alkylation 1 1
2 GO:0046903 secretion 1 1
3 GO:0042219 cellular modified amino acid catabolic process 1 1
4 GO:0006979 response to oxidative stress 1 1
5 GO:0019438 aromatic compound biosynthetic process 77 77
6 GO:0015672 monovalent inorganic cation transport 2 2
Expected raw.p.value
1 1 1
2 1 1
3 1 1
4 1 1
5 77 1
6 2 1
Instead of raw.p.value as 1 I need the actual p-values I provided in the file. Can you please help, if I am missing anything. I went over my codes multiple times but I have not done anything different as mentioned in the tutorial but my results are different from what I need.
I will appreciate your help. Thanks, Ambika
It seems that you are not providing all commands that you have used, so, it is difficult for us to debug this for you. Also, it would really help [really] to provide a minimal reproducible example as input data.
Kevin,
This is how my input files looks like:
GO_total_upregulated_genes.csv
Total_upregulated_genes_pvalue.csv
My code looks like this
I am only providing the genes that are significantly upregulated. As you can see the annotated, significant and expected gene numbers also look same. Am I doing anything wrong here. Please suggest.
Are you sure that your
gene2GO
object is constructed correctly? - it should be a list, as elaborated in section 4.3 of: https://www.bioconductor.org/packages/devel/bioc/vignettes/topGO/inst/doc/topGO.pdfAlso, I cannot verify from your code, but
geneList
should be a vector of p- (or other) values, whose names are gene names, i.e., a named vector.Kevin, I think I do have a list as mentioned in the protocol. The list looks like this
As for the p-values I am not sure. I am assigning the significant p-values as 1 and non-significant as 0. and the
I don't know where did I do wrong, but I am kind of stuck and could not proceed further because I need those p-values to make plots.
As far as I understand, the code is running but you want 'raw' p-values for the GO terms (?) or the functional entities (FUN_*) (?)
You are eliminating all information about p-values with this line:
Perhaps I am not quite understanding 100% what is the problem. I provide an example with topGO and the Kolmogorov-Smirnov test here: A: GO analysis using topGO
Yes I want the p-values and the gene id (FUN__) in my results, which I am not getting. Thank you for the link, I will check that one.
The results would be of the form:
If you want to add the original p-value for each gene identified in each term, then that will require customised coding, I think.