Entering edit mode
7.4 years ago
manay
▴
10
Hi, When I have a data set which includes 0 and 1, I can standardize the data easily in the following way. (each row denotes a chromosome and each column denotes a SNP)
p<- apply(input, 2, mean, na.rm = T)
mat <- matrix(, nr = nchr, nc = nsnp)
for (i in 1:nsnp){
mat[,i] <- (data[,i] - p[i])/sqrt(p[i] * (1 - p[i]))
}
However, I have a data set which includes A,T,G,C. Is it possible to standardize this data?
It is a small part of my data:
NA06989_A A A G G C
NA06989_B A A G G C
NA10850_A G A G G C
NA10850_B G G A G C
NA06984_A G G A G C
NA06984_B A A G G C
NA11917_A G A A T C
NA11917_B A A G G C
NA12282_A A A G G C
NA12282_B G G A G C