Hi All,
I am having a problem merging all chromosomal UK biobank files. I ran the following command.
plink2 \
--bfile /path/to/file/ukb_imp_chr1 \
--pmerge-list /path/to/file/merge.list \
--maf 0.01 \
--hwe 1e-6 \
--make-pgen \
--out /path/to/file/ukb_imp_allchr
I also tried
plink2 \
--bfile /path/to/file/ukb_imp_chr1 \
--pmerge-list /path/to/file/merge.list \
--maf 0.01 \
--hwe 1e-6 \
--make-bed \
--out /path/to/file/ukb_imp_allchr
The merge.list has the following content from chromosome 2 onwards.
/path/to/file/ukb_imp_chr2
/path/to/file/ukb_imp_chr3
/path/to/file/ukb_imp_chr4
...
/path/to/file/ukb_imp_chr22
/path/to/file/ukb_imp_chr23
/path/to/file/ukb_imp_chr24
However, once I run the command, I do not get a merged .bed file. I only get a .psam file with the following result:
Using up to 24 threads (change this with --threads).
--pmerge-list: 24 filesets specified (including main fileset).
--pmerge-list: 487409 samples and 1 phenotype present.
--pmerge-list: Merged .psam written to
/path/to/file/ukb_imp_allchr-merge.psam .
Is there something wrong with the command?
In both instances --make-pgen
or --make-bed
, I only get the psam file but nothing else.
Also is there any possibility to export to bgen/pgen to reduce file size as all individual chromosomal .bed files are at lest 200-920GB in size.
the alternative is to make a
.pgen
file, and then use--pmerge-list
, as an intermediate step, are vcf or pgen files smaller?Hi, So I have since noticed that you're trying to merge bed files, you can try the --merge-list option (with plink 1.9) first, then convert convert to plink2 pgen
./plink --bfile --merge-list /path/to/file/merge.list --out merged_chrs
./plink2 --bfile merged_chrs --make-pgen --out filename
Or if you would want to merge pgen files; convert all of your indv chr data to pgen files first, then try merging
./plink2 --pfile --pmerge-list /path/to/file/merge.list --out merged_chrs
Pgen files are similar to bed files except they also include dosage information of the SNPs (so in a way are more similar to vcf files)
Let me know if the pgen merge works after conversion - I couldn't get it to work for my data, hence the work around using bcf tools. However even still, I believe will be easier for keeping track of ref/alt alleles when merging in vcf/bcf format (especially if this is post-imputation). It also is faster generally.