Hi,
You asked three questions so I will attempt to help the best I can
What should be the optimal GC score cutoff
It sounds like you are working with Illumina's Genome Studio Final Genotype Report. Please correct me if I am wrong in that assumption, because the rest of my answer is going to be based on that.
There is no global interpretation of a GenScore call as it depends on the clustering of samples at each SNP. Clustering can be affected by many different variables, including the quality of the samples and loci.
A GenCall score value is calculated for every genotype and can range from 0.0 to 1.0. GenCall scores are calculated using information from the sample clustering algorithm. Each SNP is evaluated based on the angle of the clusters, dispersion of the clusters, and intensity. Genotypes with lower GenCall scores are located furthest from the center of a cluster and have lower reliability. If you have access to your GenomeStudio GT Module (Again, this is the assuming that you were given Illumina's Genome Studio Final Genotype Report. Please correct me if I am wrong), you should edit loci that are not clustered or called correctly to fully utilize your data set.
Now let's say that you don't have access to GenomeStudio GT Module, and you only have a file containing a GC score cutoff. In general, the Illumina FastTrack Genotyping Project Managers typically use a "no-call" threshold of 0.15 with Infinium data. This means genotypes with a GC score < 0.15 are not assigned genotypes because they are considered too far from the cluster centroid to make reliable genotype calls. Does your column have any scores less than 0.15? That would be a good place to start looking at your data. Your report may have already had all of the unreliable genotype calls filtered out.
However, as I first mentioned, there is no global interpretation of a GenScore call. You could experiment with different cutoff values and assess their impact on your analysis to determine the most suitable threshold.
Which alleles should be compared
What is your specific research question? What is your genetic model for the analysis?
Here are some common allele comparisons:
- case-control study. The frequency of each allele is compared between cases (individual with trait or condition of interest) and control (individual without the trait or condition)
- allelic association. Compare the frequency of one specific allele between case and control. This should be done if your hypothesis is that a specific allele is involved
in the trait or condition you are investigating
- genotypic association. Compare frequencies of different genotype combinations at the SNP locus between case and control. this analyzes if individuals: homozygous
for allele 1, heterozygous, or homozygous for allele 2 are more common in cases compared to control.
How should I conduct these analyses
I would recommend reading relevant literature (Their methods sections) based on your research question to determine the best way for you to conduct your analysis based on which alleles you want to compare. For a binary trait (disease status): logistic regression, chi-square test, Fisher exact test, etc.