Association study using SNP data
2
0
Entering edit mode
6 months ago

Hi,

I have metadata containing information on 15 samples and a SNP-map file with columns: Index, SNPName, Chromosome, Position, GenTrain Score, SNP, ILMN Strand, Customer Strand, and NormID. Additionally, I have a file with allele information, including columns for SNP Name, Sample ID, Allele1 - Forward, Allele2 - Forward, Allele1 - Top, Allele2 - Top, Allele1 - AB, Allele2 - AB, GC Score, and XY. With this data, I aim to investigate associations between different sample groups, such as treated versus control etc. What should be the optimal GC score cutoff, which alleles should be compared, and how should I conduct these analyses? Thanks in advance.

Association Allele Parentage SNP • 638 views
ADD COMMENT
2
Entering edit mode
6 months ago
Mathew ▴ 180

Hi,

You asked three questions so I will attempt to help the best I can

What should be the optimal GC score cutoff

It sounds like you are working with Illumina's Genome Studio Final Genotype Report. Please correct me if I am wrong in that assumption, because the rest of my answer is going to be based on that.

There is no global interpretation of a GenScore call as it depends on the clustering of samples at each SNP. Clustering can be affected by many different variables, including the quality of the samples and loci.

A GenCall score value is calculated for every genotype and can range from 0.0 to 1.0. GenCall scores are calculated using information from the sample clustering algorithm. Each SNP is evaluated based on the angle of the clusters, dispersion of the clusters, and intensity. Genotypes with lower GenCall scores are located furthest from the center of a cluster and have lower reliability. If you have access to your GenomeStudio GT Module (Again, this is the assuming that you were given Illumina's Genome Studio Final Genotype Report. Please correct me if I am wrong), you should edit loci that are not clustered or called correctly to fully utilize your data set.

Now let's say that you don't have access to GenomeStudio GT Module, and you only have a file containing a GC score cutoff. In general, the Illumina FastTrack Genotyping Project Managers typically use a "no-call" threshold of 0.15 with Infinium data. This means genotypes with a GC score < 0.15 are not assigned genotypes because they are considered too far from the cluster centroid to make reliable genotype calls. Does your column have any scores less than 0.15? That would be a good place to start looking at your data. Your report may have already had all of the unreliable genotype calls filtered out.

However, as I first mentioned, there is no global interpretation of a GenScore call. You could experiment with different cutoff values and assess their impact on your analysis to determine the most suitable threshold.

Which alleles should be compared

What is your specific research question? What is your genetic model for the analysis?

Here are some common allele comparisons:

  1. case-control study. The frequency of each allele is compared between cases (individual with trait or condition of interest) and control (individual without the trait or condition)
  2. allelic association. Compare the frequency of one specific allele between case and control. This should be done if your hypothesis is that a specific allele is involved in the trait or condition you are investigating
  3. genotypic association. Compare frequencies of different genotype combinations at the SNP locus between case and control. this analyzes if individuals: homozygous for allele 1, heterozygous, or homozygous for allele 2 are more common in cases compared to control.

How should I conduct these analyses

I would recommend reading relevant literature (Their methods sections) based on your research question to determine the best way for you to conduct your analysis based on which alleles you want to compare. For a binary trait (disease status): logistic regression, chi-square test, Fisher exact test, etc.

ADD COMMENT
0
Entering edit mode
6 months ago

Hi Matthew, Thank you very much for your inputs. I am working with Neogen genotyping data, which includes GC values less than 0.15.

ADD COMMENT

Login before adding your answer.

Traffic: 2334 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6