Entering edit mode
5.3 years ago
WUSCHEL
▴
810
By looking at Density distribution curves can I decide best imputation method for omics data?
# All possible imputation methods are printed in an error, if an invalid function name is given.
impute(data_norm, fun = "")
## Error in match.arg(fun): 'arg' should be one of "bpca", "knn", "QRILC", "MLE", "MinDet", "MinProb", "man", "min", "zero", "mixed", "nbavg"
# Impute missing data using random draws from a Gaussian distribution centered around a minimal value (for MNAR)
data_imp <- impute(data_norm, fun = "MinProb", q = 0.01)
# Impute missing data using random draws from a manually defined left-shifted Gaussian distribution (for MNAR)
data_imp_man <- impute(data_norm, fun = "man", shift = 1.8, scale = 0.3)
# Impute missing data using the k-nearest neighbour approach (for MAR)
data_imp_knn <- impute(data_norm, fun = "knn", rowmax = 0.9)
The effect of the imputation on the distributions can be visualized.
# Plot intensity distributions before and after imputation
plot_imputation(data_norm, data_imp)
What are the parameters I should look at the decide best imputation method when working with several genotypes?
What's the data? There are established procedures for dealing with missing data for some data types. Read the literature related to your data to look for commonly used imputation methods.