Hi all,
I am starting work with proteomic data and I am quite new , the first challenge I am approaching is to deal with NAs values and run imputation. My data has been normalized but it is skewed eg:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
220 8123 18080 702594 48181 132186128 162
So my first temptation would be to log-transform it to have a closer normal distribution and then run imputation using the log-transformed data.
Is this a valid approach or would be better to run imputation in the 'skewed' data and then log transform the results?
I also appreciate if someone could point me to some extra documentation that support this decision!
Thank you!