I have a bunch of clusters of genes , 128 clusters to be precise and each cluster contains a bunch of genes . Dataset is case and control dataset with two classes that is diseased and non-diseased, autism disease to be specific. There is a well-defined list of genes associated with the autism and is compiled and scored with different categories at SFARI.
My First Question is that how can I check the biological significance of those clusters, that is how to test if I did not get those clusters by chance, what is the procedure to test the significance of those clusters, is there some tool or R package for this? My Second Question is that how can I check the biological significance of those genes clusters, is there some tool or R package for this also?
It would be pretty helpful if somebody could guide or suggest me relevant steps that I should do or that one should do based on literature.
Regards
I got that but how can I use the list of gold standard or well-quality available genes as I said there is a list of genes available to download at SFARI on autism so how can I use this information to test the significance of my cluster of genes? Can you please elaborate that?
To test any form of significance (enrichment analysis) IMO you need to create a contingency table https://en.wikipedia.org/wiki/Contingency_table . Once you will do it, you're golden. However, this table purely depends on the scientific question you want to answer and it is up to you how you will create it. I am answering the 2nd question btw, for the 1st one - significance of clusters - there is no well-defined answer. I think you can extract significant bits of the inormation from the methods from this paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5802446/