I have some 1000 Genomes Phase 3 data as binary PLINK files. Reading in the files results in the following:
read in files from directory
fam.chr1.1000GP3 <- "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.fam" bim.chr1.1000GP3 <- "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.bim" bed.chr1.1000GP3 <- "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.bed"
convert to gds format
snpgdsBED2GDS(bed.chr1.1000GP3, bim.chr1.1000GP3, fam.chr1.1000GP3, "chr1.1000GP3.gds", snpfirstdim = FALSE) Start snpgdsBED2GDS ... BED file: "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.bed" in the SNP-major mode (Sample X SNP) FAM file: "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.bim", DONE. BIM file: "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.fam", DONE. Tue Feb 14 16:43:56 2017 store sample id, snp id, position, and chromosome. start writing: 530207 samples, 2504 SNPs ... Tue Feb 14 16:43:56 2017 0% Tue Feb 14 16:44:21 2017 100% Tue Feb 14 16:44:24 2017 Done. Optimize the access efficiency ... Clean up the fragments of GDS file: open the file "chr1.1000GP3.gds" (size: 334054845). # of fragments in total: 39. save it to "chr1.1000GP3.gds.tmp". rename "chr1.1000GP3.gds.tmp" (size: 334054593). # of fragments in total: 18. Warning message: In snpgdsBED2GDS(bed.chr1.1000GP3, bim.chr1.1000GP3, fam.chr1.1000GP3, : NAs introduced by coercion
Essentially the data is swappped, instead of 2504 samples with 530 2017 snps, it is the other way round. How do I change my PLINK file so that the input will reflect 2504 individuals and ~ 500k snps?