Cochran–Armitage test
1
0
Entering edit mode
2.1 years ago
Eliza ▴ 40

Hi, I’m new to genetics. I have data that contains SNP`s the outcome variable is disease severity (sever\mild). what I have to do is to perform Cochran–Armitage test for trend to test the association between SNP and disease severity (sever\mild) and for each SNP to have a P-value. I read about the test on Wikipedia but I couldn’t still grasp the concept. I know that I have an outcome variable with 2 categories: disease severity (sever\mild) and one predictor variable. but I don’t understand which value is the predictor: REF or ALT column and if I’m supposed to have only 2 categories in the predictive variable but I have more what should I do? would appreciate the help as I’m very confused, I found that there is CATT package in R to perform the test.: catt(y, x, score = c(0, 1, 2))

how do i Assign value to the X variable based on my data ( for every SNP - should i take into account the CHROM and POS column ?) ,

enter image description here

test SNP Armitage VCF • 1.5k views
ADD COMMENT
2
Entering edit mode
2.1 years ago
iraun 6.2k

Hi!

Apart from wikipedia, there are other valuable sources that might be worth reading, a simple search such as "cochran–armitage test with SNPs" is generating many relevant articles.

Not sure about python, but in R it seems you can use catt to conduct the test.

I have not done this kind of analysis myself, but I would start by:

  • Assigning severity to y variable, and replace sever with 1 and mild with 0.
  • Assigning genotype to x variable, and use AF VALUE column to determine genotype. 0 for homozygous REF, 1 for heterozygous, and 2 for homozygous ALT (for example).
ADD COMMENT
0
Entering edit mode

shouldnt i use just one column as X ? i dont understand how to assign the X variable and creat the groups should i take ito acound rhe "CHROM" and "POSITION" column ? thank you

ADD REPLY
0
Entering edit mode

iraun Hi thank you for your answer, can you clarify how you chose the Assigning genotype to x: " 0 for homozygous REF, 1 for heterozygous, and 2 for homozygous ALT " i should include only one column as a predictor , so hoe can i decide on the groups:(0,1,2)?

ADD REPLY
0
Entering edit mode

Hi @Eliza. I am not sure exactly what kind of experiment or data you have. In general (as far as I know, but as I said, I am not an expert), if you want to associate the genotype of a gene with a specific condition, you first need cases and control individuals. Then, for each individual, you create a column with the genotype. 0 will indicate that the given individual shows a "reference" genotype, in other words, both alleles harbor the nucleotide present in REF column (these individuals will be tagged as 0 for the genotype). If the individual is heterozygous, then one allele has the REF nucleotide, while the other allele has the ALT (these individuals will be tagged as 1 for the genotype). And the last case is that the individual is homozygous for the nucleotide specified in ALT (these individuals will be tagged as 2 for the genotype). Once you have these information, you should organize it in a contingency table, and carry the statistical test to associate the genotype with the condition/disease.

ADD REPLY
0
Entering edit mode

iraun . Thank you it made it much clearer . If you could clarify one more thing. Im preforming the test for each snp in the data :

  1. Because I have only snp the genotype can't be zero ?
    1. There are a lot snps in the data that occurs only once does it mean I can't preform on them the Armitage test to find the association between snp and disease severity? If so should I just delete them from the data? Thank you
ADD REPLY
0
Entering edit mode

Because I have only snp the genotype can't be zero ?

That is correct, if you have SNP data, then your genotype encoding to run CATT will be either 1 or 2, depending if it is homo or heterozygous.

There are a lot snps in the data that occurs only once...

Ideally you should have information of the genotype of each individual in your dataset. However, you could say that those individuals not having the SNP have a reference genotype (and therefore, encode them with 0).

ADD REPLY
0
Entering edit mode

iraun so if I understand you correctly ; I have 21 patients and snps. Some of the snps accure only one time for example patient 1 has 3 snps the first snp only this individual had it out of 21 patients and there for it occurs only one time in the data ( other can occur between 2 and 21 times) so there for should I encode them as 0? I still don't understand how I preform the Armitage test on them as you can't preform a statistical test on 1 observation. Should I ignore them ? Or preform the test on all of them together ( different snp in different positions and different chromosomes? I need to preform the test on every snp )

ADD REPLY

Login before adding your answer.

Traffic: 2037 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6