Hi, I’m new to genetics. I have data that contains SNP`s the outcome variable is disease severity (sever\mild). what I have to do is to perform Cochran–Armitage test for trend to test the association between SNP and disease severity (sever\mild) and for each SNP to have a P-value. I read about the test on Wikipedia but I couldn’t still grasp the concept. I know that I have an outcome variable with 2 categories: disease severity (sever\mild) and one predictor variable. but I don’t understand which value is the predictor: REF or ALT column and if I’m supposed to have only 2 categories in the predictive variable but I have more what should I do? would appreciate the help as I’m very confused, I found that there is CATT package in R to perform the test.: catt(y, x, score = c(0, 1, 2))
how do i Assign value to the X variable based on my data ( for every SNP - should i take into account the CHROM and POS column ?) ,
shouldnt i use just one column as X ? i dont understand how to assign the X variable and creat the groups should i take ito acound rhe "CHROM" and "POSITION" column ? thank you
iraun Hi thank you for your answer, can you clarify how you chose the Assigning genotype to x: " 0 for homozygous REF, 1 for heterozygous, and 2 for homozygous ALT " i should include only one column as a predictor , so hoe can i decide on the groups:(0,1,2)?
Hi @Eliza. I am not sure exactly what kind of experiment or data you have. In general (as far as I know, but as I said, I am not an expert), if you want to associate the genotype of a gene with a specific condition, you first need cases and control individuals. Then, for each individual, you create a column with the genotype. 0 will indicate that the given individual shows a "reference" genotype, in other words, both alleles harbor the nucleotide present in REF column (these individuals will be tagged as
0
for the genotype). If the individual is heterozygous, then one allele has the REF nucleotide, while the other allele has the ALT (these individuals will be tagged as1
for the genotype). And the last case is that the individual is homozygous for the nucleotide specified in ALT (these individuals will be tagged as2
for the genotype). Once you have these information, you should organize it in a contingency table, and carry the statistical test to associate the genotype with the condition/disease.iraun . Thank you it made it much clearer . If you could clarify one more thing. Im preforming the test for each snp in the data :
Because I have only snp the genotype can't be zero ?
That is correct, if you have SNP data, then your genotype encoding to run CATT will be either 1 or 2, depending if it is homo or heterozygous.
There are a lot snps in the data that occurs only once...
Ideally you should have information of the genotype of each individual in your dataset. However, you could say that those individuals not having the SNP have a reference genotype (and therefore, encode them with 0).
iraun so if I understand you correctly ; I have 21 patients and snps. Some of the snps accure only one time for example patient 1 has 3 snps the first snp only this individual had it out of 21 patients and there for it occurs only one time in the data ( other can occur between 2 and 21 times) so there for should I encode them as 0? I still don't understand how I preform the Armitage test on them as you can't preform a statistical test on 1 observation. Should I ignore them ? Or preform the test on all of them together ( different snp in different positions and different chromosomes? I need to preform the test on every snp )