Inverse normal transformation
1
1
Entering edit mode
6.6 years ago
rednalf ▴ 90

My phenotype is not normally distributed. I tried several transformations but none of them seems to improve the normality. I am now interested in an inverse normal transformation using R.

Is something like the following correct, x being my phenotype?

qx <- qnorm((rank(x)-0.5)/sum(x))

It is based on this paper:

Yang, Jian, et al. "FTO genotype is associated with phenotypic variability of body mass index." Nature 490.7419 (2012): 267.

Histogram before transformation available here: https://imgur.com/a/BwednJB

More information about the raw data:

  • min: 10
  • max: 750
  • median: 54.75
  • mean: 86.18217
  • variance: 8428.881
  • standard deviation: 91.80894
R transformation inverse statistics • 8.7k views
ADD COMMENT
0
Entering edit mode

Please show a histogram of your phenotype before any transformation. Also provide min, max, median, mean, variance, and standard deviation.

ADD REPLY
0
Entering edit mode

Thank you for your comment - the information has been added to the initial post.

ADD REPLY
1
Entering edit mode

I see - thanks so much. So it's currently a negative binomial or Poisson-like distribution, akin to how RNA-seq count data is measured and normalised. Have you considered a variance stabilising transformation?; or regularised log (like in DESeq2)? You could also just fit the model as a negative binomial using glm.nb

ADD REPLY
0
Entering edit mode

Thank you. I have considered log, square root, cube root and Johnson transformation for now. I will have a look at DESeq2. What do you think about the inverse normal transformation and the script based on Yang et al.'s paper?

ADD REPLY
0
Entering edit mode

From where did you find that formula? - I looked in the paper an they mentioned that they tried un-transformed, logged, and then inverse normal transformed. I could not see a formula, though.

ADD REPLY
0
Entering edit mode
ADD REPLY
5
Entering edit mode
6.6 years ago

Okay, yes, here is the page:

Screen_Shot_2018_05_03_at_16_52_01

Note that they are not transforming the original variable. What they do is the following (for height and weight):

  1. build a linear regression model lm(height ~ age + age^2)
  2. extract residuals from model with residuals()
  3. transform residuals by inverse norm function y <‐ qnorm((rank(x, na.last="keep") ‐ 0.5) / sum(!is.na(x))

The transformed residuals (squared) are then used in your association test, as follows:

glm(y^2 ~ SNP)

Does that help?

Kevin

ADD COMMENT
0
Entering edit mode

Yes a lot, thank you very much for your (fast) help!

ADD REPLY
0
Entering edit mode

No problem. Note that they also segregate the analysis into two based on gender (and only those >18 years of age)

ADD REPLY

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6