My data looks as follows:
1 rs987435 0 2 1 2 2 1 1
2 rs345783 0 0 0 0 0 0 0
I removed the snps with MAF < 5% using the following code
data <- datasnp[rowSums(datasnp==2)/ncol(datasnp) > 0.05, ] , is this correct ??
now I want to test for HWE
The data for the HWE exact should be as follows
0 1 2
rs987435 #of zero #of 1 #of 2
and so on for the rest of snps, I would like to have a code in R to transform the data as I mentioned can any one help
so what is the correct code for keeping the snps with MAF > 5% ?
What do you think it is?
I'm not sure what did you mean by adding the heterozygous counts? do you mean replace the rowSum(datasnp==2) with 0.5*rowSums(datasnp==1)
My presumption is that
0
indicates homozygous for the reference,1
heterozygous, and2
homozygous for the alternate allele. So assuming the reference is the major allele, then the MAF is the number of2
s plus have the number of1
s divided by the number of columns.Thank you, my question is after I calculate the MAF the correct way , I will remove the row (SNP) according to the code below, right??
data <- datasnp[rowSums(datasnp==2)+0.5*rowSums(datasnp==1) /ncol(datasnp) > 0.05, ]
Note the extra set of parentheses.
Thank you for answer, I need to understand why the MAF is calculate this way could you please send me a link that explains why
Thank you again
I guess you could just google around for "allele frequency". But frankly this is simply the definition.