I have just been assigned to a project that is in the Bioinformatics field which is novel for me and it involves the following:
There is a list of genes. These genes have been associated with pathways derived from the KEGG database. I also have the KEGG genes and their associated pathways. I have to calculate the significant pathways that are present in my dataset. For that, I have to do a hypergeometric test. After that, I have to select the pathways that have p-values less than 0.005.
What is the meaning of choosing the pathways within this cut-off? When I know that the genes in my dataset belong to certain pathways already, why do I need to do a hypergeometric test? Why would it not be enough to just detect the pathways present in my dataset by finding the intersection between my gene set and that of KEGG's?
Here is a simple example:
If the numbers are too close, there is a high chance that even if you randomly select some genes from a pathway, you'll get some of the genes that are in your list. In order to confirm that the genes present in your list are representing a pathway not by chance but by the condition you are testing.
This is a very helpful guide.
Was this not something you wanted to do? In addition to that you are being told what you need to do in detail. So what is the purpose of this exercise. Are you expected to learn something in the process or just complete the task at hand?
Take a look at some helpful GO enrichment analysis materials here. These principles will be applicable in your case as well. Some useful tools are listed in this WikiPedia link.