My phenotype is not normally distributed. I tried several transformations but none of them seems to improve the normality. I am now interested in an inverse normal transformation using R.
Is something like the following correct, x being my phenotype?
qx <- qnorm((rank(x)-0.5)/sum(x))
It is based on this paper:
Yang, Jian, et al. "FTO genotype is associated with phenotypic variability of body mass index." Nature 490.7419 (2012): 267.
Histogram before transformation available here: https://imgur.com/a/BwednJB
More information about the raw data:
- min: 10
- max: 750
- median: 54.75
- mean: 86.18217
- variance: 8428.881
- standard deviation: 91.80894
Please show a histogram of your phenotype before any transformation. Also provide min, max, median, mean, variance, and standard deviation.
Thank you for your comment - the information has been added to the initial post.
I see - thanks so much. So it's currently a negative binomial or Poisson-like distribution, akin to how RNA-seq count data is measured and normalised. Have you considered a variance stabilising transformation?; or regularised log (like in DESeq2)? You could also just fit the model as a negative binomial using glm.nb
Thank you. I have considered log, square root, cube root and Johnson transformation for now. I will have a look at DESeq2. What do you think about the inverse normal transformation and the script based on Yang et al.'s paper?
From where did you find that formula? - I looked in the paper an they mentioned that they tried un-transformed, logged, and then inverse normal transformed. I could not see a formula, though.
I found it in the Supplementary Information, page 18 (https://media.nature.com/original/nature-assets/nature/journal/v490/n7419/extref/nature11401-s1.pdf)