Entering edit mode
4.6 years ago
b.ambrozio
▴
30
I'm trying to simulate binary phenotypes from the 1000 Genome Phase 3 datasets using gcta64 --simu-cc, but no success. Everything seems to be going well, but in the end I get:
Error: can not open the file [] to read.
An error occurs, please check the options or data
And the log shows:
Accepted options:
Here's the commands I'm using:
# Convert the VCF to plink format:
$ ./plink2 --vcf ../../ALL.phase3.biallelic-only.vcf.gz.10kSNPs.vcf.gz --make-bed --out ALL.phase3.biallelic-only.vcf.gz.10kSNPs
# Try to simulate the phenotype:
$ ./gcta64 --bfile ALL.phase3.biallelic-only.vcf.gz.10kSNPs --simu-cc 500 500 --simu-hsq 0.5 --simu-k 0.1 --simu-rep 3 --out ALL.phase3.biallelic-only.vcf.gz.10kSNPs
Here's the whole steps with the outputs:
$ ls
gcta64 plink2
$ ./plink2 --vcf ../../ALL.phase3.biallelic-only.vcf.gz.10kSNPs.vcf.gz --make-bed --out ALL.phase3.biallelic-only.vcf.gz.10kSNPs
PLINK v2.00a2.3 64-bit (24 Jan 2020) www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to ALL.phase3.biallelic-only.vcf.gz.10kSNPs.log.
Options in effect:
--make-bed
--out ALL.phase3.biallelic-only.vcf.gz.10kSNPs
--vcf ../../ALL.phase3.biallelic-only.vcf.gz.10kSNPs.vcf.gz
Start time: Sun Mar 22 11:56:23 2020
16384 MiB RAM detected; reserving 8192 MiB for main workspace.
Using up to 8 compute threads.
--vcf: 220000 variants scanned.
--vcf: ALL.phase3.biallelic-only.vcf.gz.10kSNPs-temporary.pgen +
ALL.phase3.biallelic-only.vcf.gz.10kSNPs-temporary.pvar +
ALL.phase3.biallelic-only.vcf.gz.10kSNPs-temporary.psam written.
2504 samples (0 females, 0 males, 2504 ambiguous; 2504 founders) loaded from
ALL.phase3.biallelic-only.vcf.gz.10kSNPs-temporary.psam.
220000 variants loaded from
ALL.phase3.biallelic-only.vcf.gz.10kSNPs-temporary.pvar.
Note: No phenotype data present.
Writing ALL.phase3.biallelic-only.vcf.gz.10kSNPs.fam ... done.
Writing ALL.phase3.biallelic-only.vcf.gz.10kSNPs.bim ... done.
Writing ALL.phase3.biallelic-only.vcf.gz.10kSNPs.bed ... done.
End time: Sun Mar 22 11:56:28 2020
$ ls
ALL.phase3.biallelic-only.vcf.gz.10kSNPs.bed ALL.phase3.biallelic-only.vcf.gz.10kSNPs.fam gcta64
ALL.phase3.biallelic-only.vcf.gz.10kSNPs.bim ALL.phase3.biallelic-only.vcf.gz.10kSNPs.log plink2
$ ./gcta64 --bfile ALL.phase3.biallelic-only.vcf.gz.10kSNPs --simu-cc 500 500 --simu-hsq 0.5 --simu-k 0.1 --simu-rep 3 --out ALL.phase3.biallelic-only.vcf.gz.10kSNPs
*******************************************************************
* Genome-wide Complex Trait Analysis (GCTA)
* version 1.93.0 beta Mac
* (C) 2010-2019, The University of Queensland
* Please report bugs to Jian Yang <jian.yang@uq.edu.au>
*******************************************************************
Analysis started at 11:59:16 GMT on Sun Mar 22 2020.
Hostname: Brunos-MBP
Accepted options:
--bfile ALL.phase3.biallelic-only.vcf.gz.10kSNPs
--simu-cc 500 500
--simu-hsq 0.5
--simu-k 0.1
--simu-rep 3
--out ALL.phase3.biallelic-only.vcf.gz.10kSNPs
Reading PLINK FAM file from [ALL.phase3.biallelic-only.vcf.gz.10kSNPs.fam].
2504 individuals to be included from [ALL.phase3.biallelic-only.vcf.gz.10kSNPs.fam].
Reading PLINK BIM file from [ALL.phase3.biallelic-only.vcf.gz.10kSNPs.bim].
220000 SNPs to be included from [ALL.phase3.biallelic-only.vcf.gz.10kSNPs.bim].
Warning: Duplicated SNP ID "rs145607083" has been changed to "rs145607083_5264"
.Warning: Duplicated SNP ID "rs145607083" has been changed to "rs145607083_5265"
.Warning: Duplicated SNP ID "rs71955229" has been changed to "rs71955229_27061"
.Warning: Duplicated SNP ID "rs71955229" has been changed to "rs71955229_27062"
.Warning: Duplicated SNP ID "rs71589472" has been changed to "rs71589472_42505"
.Warning: Duplicated SNP ID "rs563156514" has been changed to "rs563156514_49111"
.Warning: Duplicated SNP ID "rs563156514" has been changed to "rs563156514_49112"
.Warning: Duplicated SNP ID "rs539504239" has been changed to "rs539504239_79196"
.Warning: Duplicated SNP ID "rs35739849" has been changed to "rs35739849_105514"
.Warning: Duplicated SNP ID "rs148795567" has been changed to "rs148795567_123134"
.Warning: Duplicated SNP ID "rs143101359" has been changed to "rs143101359_201815"
.Reading PLINK BED file from [ALL.phase3.biallelic-only.vcf.gz.10kSNPs.bed] in SNP-major format ...
Genotype data for 2504 individuals and 220000 SNPs to be included from [ALL.phase3.biallelic-only.vcf.gz.10kSNPs.bed].
Simulation parameters:
Number of simulation replicate(s) = 3 (Default = 1)
Heritability of liability = 0.5 (Default = 0.1)
Disease prevalence = 0.1 (Default = 0.1)
Number of cases = 500
Number of controls = 500
Error: can not open the file [] to read.
An error occurs, please check the options or data
$ head ALL.phase3.biallelic-only.vcf.gz.10kSNPs.log
*******************************************************************
* Genome-wide Complex Trait Analysis (GCTA)
* version 1.93.0 beta Mac
* (C) 2010-2019, The University of Queensland
* Please report bugs to Jian Yang <jian.yang@uq.edu.au>
*******************************************************************
Analysis started at 11:59:16 GMT on Sun Mar 22 2020.
Hostname: Brunos-MBP
Accepted options:
I'm opened to use other tools if you have any recommendation. Not sure, but looks like Plink doesn't do that (you can simulate phenotype as long as you also simulate the genetic data as well...)