read in files from directory

Question

How to correct PLINK input for SNPRelate so that samples are not uploaded as snps

0

Entering edit mode

8.2 years ago

moithuti • 0

I have some 1000 Genomes Phase 3 data as binary PLINK files. Reading in the files results in the following:

read in files from directory

fam.chr1.1000GP3 <- "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.fam" bim.chr1.1000GP3 <- "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.bim" bed.chr1.1000GP3 <- "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.bed"

convert to gds format

snpgdsBED2GDS(bed.chr1.1000GP3, bim.chr1.1000GP3, fam.chr1.1000GP3, "chr1.1000GP3.gds", snpfirstdim = FALSE) Start snpgdsBED2GDS ... BED file: "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.bed" in the SNP-major mode (Sample X SNP) FAM file: "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.bim", DONE. BIM file: "chr1.common.variants.1000GP3_20130502_allchrSNPs_nodup2_updatefidparentidgender.fam", DONE. Tue Feb 14 16:43:56 2017 store sample id, snp id, position, and chromosome. start writing: 530207 samples, 2504 SNPs ... Tue Feb 14 16:43:56 2017 0% Tue Feb 14 16:44:21 2017 100% Tue Feb 14 16:44:24 2017 Done. Optimize the access efficiency ... Clean up the fragments of GDS file: open the file "chr1.1000GP3.gds" (size: 334054845). # of fragments in total: 39. save it to "chr1.1000GP3.gds.tmp". rename "chr1.1000GP3.gds.tmp" (size: 334054593). # of fragments in total: 18. Warning message: In snpgdsBED2GDS(bed.chr1.1000GP3, bim.chr1.1000GP3, fam.chr1.1000GP3, : NAs introduced by coercion

Essentially the data is swappped, instead of 2504 samples with 530 2017 snps, it is the other way round. How do I change my PLINK file so that the input will reflect 2504 individuals and ~ 500k snps?

SNPRelate PLINK Data format PCA • 2.1k views

ADD COMMENT • link 8.2 years ago by moithuti • 0

score 0 · Answer 1 · 2017-02-21

Took a simple workaround to solve. Basically I converted the binary PLINK file to a vcf and the input was read properly as X - samples and X - snps. If anyone can still get the PLINK format to work correctly, I would still like to know how to correct the PLINK file so that it is read in as X - samples and X-snps.