Hi all,
I'm performing an eQTL analysis on about 120 samples combined with 5 million genotypes. As covariates I initially took RIN, age, processing day and gender and the results look nice and had a replication rate of about 40% compared to previous studies. A colleague said then to me that I should take also the first five components of a PCA of the gene expression as covariates. So I did that and then someone warned me that covariates could orthogonal and then I could be over correcting. Now I redid the analysis with four different sets of covariates (4 original, first 5 PCA, first 20 PCA and first 40 PCA) and I get more significant results when I include more covariates, so now I am a bit confused. I have three questions and was hoping someone could help me out with this
What is the best amount of covariates to include in the analysis? Is there some kind of optimal number of covariates that you can calculate? Looking at the results it doesnt seem to matter that much including the number of covariates... about 75% of the results looks consistent
How can I check whether covariates are orthogonal? Is a simple Pearson correlation above 0.50 or below -0.50 already enough evidence for orthogonal covariates?
Why do the number of significant results increase when including more covariates? I was more thinking in the line that the more you correct the less significant results I should get?