Entering edit mode
4.8 years ago
strkiky2
•
0
Here's a dataset
data <- t(data.frame(met1 = c(2,2,2,2,2),
met2 = c(5,4,NA,2,1),
met3 = c(2,2,2,NA,2),
met4 = c(2,4,6,8,6),
met5 = c(1,3,4,7,2)))
This gives:
[,1] [,2] [,3] [,4] [,5]
met1 2 2 2 2 2
met2 5 4 NA 2 1
met3 2 2 2 NA 2
met4 2 4 6 8 6
met5 1 3 4 7 2
I often conduct row-wise correction on my dataset. Which divide all the values after summing, meaning that all the values are between 0 and 1.
data <- data / rowSums(data, na.rm = TRUE)
This works great when there's no missing data. But as you can see when comparing met1 and met3, each value of met3 is considerably higher than met1 due to the missing data.
[,1] [,2] [,3] [,4] [,5]
met1 0.20000000 0.2000000 0.2000000 0.2000000 0.20000000
met2 0.41666667 0.3333333 NA 0.1666667 0.08333333
met3 0.25000000 0.2500000 0.2500000 NA 0.25000000
met4 0.07692308 0.1538462 0.2307692 0.3076923 0.23076923
met5 0.05882353 0.1764706 0.2352941 0.4117647 0.11764706
How could I offset this effect? Currently I've removed any column with missing data, but I prefer not doing so as some important data could be removed.
You can replace them with the row means. https://stackoverflow.com/questions/6918086/replace-na-values-by-row-means