Question

How to extract the classification/regression metrics from a GWAS so that I can compare different tools?

0

Entering edit mode

5.5 years ago

b.ambrozio ▴ 30

If I understood well, GWAS is pretty much a feature selection approach based on a classification or regression algorithm, whenever the underlying trait is qualitative or quantitative, respectively.

My question is, how can I extract the classification/regression metrics from the executed GWAS algorithm when I'm using, for example, PLINK, GCTA, SAIGE, or BOLT-LMM?

Hypothetical scenario: - I'm looking for SNP-causal of type-2 diabetes in a high unbalanced (case-control=1:100), and relatively big dataset (N > 6k). I know that SAIGE is usually the best to address such a scenario, but I want to compare the results among the other tools as well. Usually, for classification algorithms, we use a confusion matrix (true/false-positives, true/false-negatives) and from that, we can calculate accuracy, precision, recall, Sensitivity, F1 Score, etc...

Therefore, how do I get the confusion-matrix from a GWAS based on classification algorithms? Is it possible to go beyond the GWAS and run the classification by using the features selected for that, throughout the mentioned tools?

I found a lot about comparing "false-positives", "statistical power", etc... But I didn't understand yet how they have been evaluated, once I didn't see how to collect the confusion matrix from the GWAS models. I mean, I don't see the classification happening after the feature selection (after the SNP p-values are assigned).

plink SAIGE GCTA BOLT-LMM metrics • 937 views

ADD COMMENT • link 5.5 years ago by b.ambrozio ▴ 30