I am calculating the SNP effect with plink2, I used the external phenotype instead of inside the fam file and the same file (with other columns) as the covariate file. May I ask why when I used the first code below with covariates as columns 7 to 11, it results in this message and then stopped while it can go through the other 2 codes for column 11 separately and columns 7-10. Is there any problem with the combination of covariates?
Start time: Mon Jan 21 19:25:22 2019
257682 MB RAM detected; reserving 128841 MB for main workspace.
Using up to 4 compute threads.
284516 samples (0 females, 0 males, 284516 ambiguous; 284516 founders) loaded
from ukbb_dis.fam.
1133273 variants loaded from ukbb_dis.bim.
1 quantitative phenotype loaded (284516 values).
3 covariates loaded from phenotype_dis.pheno.
Warning: Skipping --glm regression on phenotype 'PHENO1' since variance
inflation factor for covariate 'COVAR1' is too high. You may want to remove
redundant covariates and try again.
End time: Mon Jan 21 19:25:23 2019
First code which produced the above error:
./plink2 --bfile ukbb_dis --pheno phenotype_dis.pheno --pheno-col-nums 6 --covar phenotype_dis.pheno --covar-number 7-11 --input-missing-phenotype -10000000 --linear --adjust --out assoc_SNP_height_linear_plink2 --threads 4
The other 2 codes which work fine:
./plink2 --bfile ukbb_dis --pheno phenotype_dis.pheno --pheno-col-nums 6 --covar phenotype_dis.pheno --covar-number 7-10 --input-missing-phenotype -10000000 --linear --adjust --out assoc_SNP_height_linear_plink2 --threads 4
Or
./plink2 --bfile ukbb_dis --pheno phenotype_dis.pheno --pheno-col-nums 6 --covar phenotype_dis.pheno --covar-number 11 --input-missing-phenotype -10000000 --linear --adjust --out assoc_SNP_height_linear_plink2 --threads 4
These last 2 codes work with the current output below, it would take really long time
Start time: Mon Jan 21 19:46:53 2019
257682 MB RAM detected; reserving 128841 MB for main workspace.
Using up to 4 compute threads.
284516 samples (0 females, 0 males, 284516 ambiguous; 284516 founders) loaded
from ukbb_dis.fam.
1133273 variants loaded from ukbb_dis.bim.
1 quantitative phenotype loaded (284516 values).
4 covariates loaded from phenotype_dis.pheno.
--glm linear regression on phenotype 'PHENO1': 0%
Moreover, here are 2 lines of the phenotype_dis.pheno file The first six columns are as in the fam file, from column 7 to 11 are: batch, centre, year of birth, sex, age. This file has no header.
3319618 3319618 0 0 0 173 2000 11011 1959 1 49
4567961 4567961 0 0 0 181 19 11018 1949 1 60
Your help is really appreciated!
Thank you very much for your help, Christopher!
Hi Chang, I've done what you said but it says, may I need your support? Thank you very much!
Here is my code:
And here is the output:
Oh, I should have also noted that year-of-birth and age are almost totally redundant; you need to remove one of those covariates.
Hi Chang, It seems like my unique values of centre data is less than the batch data. But I have no problem with batch but problem with centre, I still don't have any clue why it says Variance inflation factor is too high for to centre. Is there anyway to figure out it by hand/debug manually? Thank you