I usually use a script run.sh) to run my PLINK analysis (sort of like a makefile).
Has anyone managed to run PLINK with GNU Parallel?
If so, how to use GNU Parallel with a script?
I tried the following but I cannot see 20 cores being run (by top).
>parallel -j 20 --progress | ./run.sh
Any help appreciated. Thanks
Edits: Yes, I have 40 cores. Normally I would do like this:
Using the file MyCovarfile.raw (structure shown at the bottom) for Analysis of Pheno1 and covars Age, PC1, PC2, PC3.
plink --bfile myfile \
--pheno MyCovarfile.raw \
--pheno-name Pheno1 \
--covar MyCovarfile.raw \
--covar-name Age-PC3 \
--logistic \
--adjust \
--qq-plot \
--out Pheno1_Age_PC3
Now, I can do the same with many different covariate models (and my run.sh is a list of such commands with different combinations of the covariates and phenotypes), but right now, they get excecuted on a single core one after the other serially.
Using the file MyCovarfile.raw (structure shown at the bottom) for Analysis of Pheno2 and covars Age, PC1, PC2, PC3.
plink --bfile myfile \
--pheno MyCovarfile.raw \
--pheno-name Pheno2 \
--covar MyCovarfile.raw \
--covar-name Age-PC3 \
--logistic \
--adjust \
--qq-plot \
--out Pheno2_Age_PC3
Using the file MyCovarfile.raw (structure shown at the bottom) for Analysis of Pheno2 and covars Age, PC1, PC2, PC3, PC4.
plink --bfile myfile \
--pheno MyCovarfile.raw \
--pheno-name Pheno1 \
--covar MyCovarfile.raw \
--covar-name Age-PC4 \
--logistic \
--adjust \
--qq-plot \
--out Pheno1_Age_PC4
Using the file MyCovarfile.raw (structure shown at the bottom) for Analysis of Pheno2 and covars Age, PC1, PC2, PC3, PC4.
plink --bfile myfile \
--pheno MyCovarfile.raw \
--pheno-name Pheno2 \
--covar MyCovarfile.raw \
--covar-name Age-PC4 \
--logistic \
--adjust \
--qq-plot \
--out Pheno2_Age_PC4
Hope this helps. http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#covar
Structure of MyCovarfile.raw
FID IID AFF Pheno1 Pheno2 Pheno3 Pheno4 Pheno5 Pheno6 Pheno7 Bin AGE PC1 PC2 PC3 PC4
0001 9542 1 1 1 1 1 1 1 1 1 8 -0.0053 -0.0046 0.0036 -0.0052
0002 9606 1 1 1 1 1 1 1 1 1 3 -0.0052 -0.0045 0.0035 -0.0021
0003 9702 2 1 1 1 1 1 1 1 1 3 -0.0045 -0.0041 0.0032 0.0016
0004 9544 2 1 1 1 1 1 1 1 1 5 -0.0037 -0.0028 0.0032 0.0003
where FID, IID and AFF means familyID, Individual-ID and Affection status of the individual.
let's start with the obvious - do you have 20 cores in your machine?
I don't know how to normally run PLINK on 2 sets of data. Please show how you would run PLINK on to sets of data in serial. Also please state whether you have watched the intro videos:
I don't know how to normally run PLINK on 2 sets of data. Please show how you would run PLINK on two sets of data in serial (if you cannot do that you most likely cannot use GNU Parallel). Also please state whether you have watched the intro videos:
Hi Chris, Yes, I have 40 cores.
Ole Tange, Yes watched the intro videos and tried reading the manual too.
Normally I would do like this:
Now, I can do the same with many different covariate models, but right now, they get excecuted on a single core one after the other serially.
Hope this helps.
Yes, I have 40 cores, Yes, watched the video intro and read a buit through the manual without really understanding it.
There is a really nice intro to installation of GNU Parallel here.