prepare the data for package (HardyWeinberg) to test the HWExactMat
1
0
Entering edit mode
7.8 years ago
mms140130 ▴ 60

My data looks as follows:

1 rs987435  0  2  1  2  2  1   1
2 rs345783  0  0  0  0  0  0   0

I removed the snps with MAF < 5% using the following code

data <- datasnp[rowSums(datasnp==2)/ncol(datasnp) > 0.05, ] , is this correct ??

now I want to test for HWE

The data for the HWE exact should be as follows

               0          1            2
  rs987435    #of zero     #of 1        #of 2

and so on for the rest of snps, I would like to have a code in R to transform the data as I mentioned can any one help

R snp • 1.6k views
ADD COMMENT
2
Entering edit mode
7.8 years ago

Your MAF calculation is wrong, you need to add in half the heterozygous counts (i.e., 0.5*rowSums(datasnp==1), which assumes that the alternate allele is the minor allele).

Anyway, your MAF filter method should get you most of the way to getting the appropriate input for HWE exact:

cbind(rowSums(datasnp==0), rowSums(datasnp==1), rowSums(datasnp==2))
ADD COMMENT
0
Entering edit mode

so what is the correct code for keeping the snps with MAF > 5% ?

ADD REPLY
0
Entering edit mode

What do you think it is?

ADD REPLY
0
Entering edit mode

I'm not sure what did you mean by adding the heterozygous counts? do you mean replace the rowSum(datasnp==2) with 0.5*rowSums(datasnp==1)

ADD REPLY
0
Entering edit mode

My presumption is that 0 indicates homozygous for the reference, 1 heterozygous, and 2 homozygous for the alternate allele. So assuming the reference is the major allele, then the MAF is the number of 2s plus have the number of 1s divided by the number of columns.

ADD REPLY
0
Entering edit mode

Thank you, my question is after I calculate the MAF the correct way , I will remove the row (SNP) according to the code below, right??

data <- datasnp[rowSums(datasnp==2)+0.5*rowSums(datasnp==1) /ncol(datasnp) > 0.05, ]

ADD REPLY
0
Entering edit mode
data <- datasnp[(rowSums(datasnp==2)+0.5*rowSums(datasnp==1)) /ncol(datasnp) > 0.05, ]

Note the extra set of parentheses.

ADD REPLY
0
Entering edit mode

Thank you for answer, I need to understand why the MAF is calculate this way could you please send me a link that explains why

Thank you again

ADD REPLY
0
Entering edit mode

I guess you could just google around for "allele frequency". But frankly this is simply the definition.

ADD REPLY

Login before adding your answer.

Traffic: 1997 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6