I am working on some methods for GWAS, I want to test my method in some real GWAS data set with some validated associations, where can I find one?
It's the best if there is one for human disease, but I don't think there are open ones with validated association, probably not even open ones without validated association if I don't apply to NIH.
However, I believe that there should be some data sets for the famous plant A. thaliana, where can I find it?
Thanks.
What do you mean with a validated association? How would you validate an association?
As a person on the computer science side, I cannot. And I hope that someone has collected the validated association in published work with actual wet lab experiment conducted to validate the association. @decosterwouter
There is no such thing as "wet lab validation of association". Furthermore, most association studies that I am aware (neurodegeneration) of are just signals from taggingSNPs, without truly finding the functional variants or explaining the disease effect. Perhaps you could consider these not validated.
Thank you very much. I didn't realize that. Then, if I am working on GWAS method, (like machine learning models), how should I verify that my model works?
What is the outcome of your model? Associated loci?
Yes. So I hope that there are validated associated loci so that I can verify the result. (If I calculate p-value for them to verify, then basically it's going to be hard for me to convince others that my model works better than traditional p-value based methods. )
If you are looking for strong associations from large GWAS you could try GRASP: https://grasp.nhlbi.nih.gov/FullResults.aspx. This does not give you raw individual level data per se but many strong (summary level, multi-individual) associations you could consider pursuing for functional studies.