Convert SNP dataset from PACKEDANCESTRYMAP to plink (PED)
0
0
Entering edit mode
7.1 years ago
BlastedBadger ▴ 160

I have downloaded the "Affymetrix Human Origins Curated" dataset from David's Reich Lab, but am totally at loss to understand which format it is, and how I could convert it to something usable by plink

So far, I have downloaded the utility convertf, from the AdmixTools package for example. Based on the convertf README, I assumed that the .geno, .snp and .ind files are in "PACKEDANCESTRYMAP" format.

I attempted to convert them to PED using the following "parfile" for convertf:

genotypename:    panel1.geno
snpname:         panel1.snp
indivname:       panel1.ind
outputformat:    PED
genotypeoutname: panel1-PED.ped
snpoutname:      panel1-PED.map
indivoutname:    panel1-PED.pedind

Then convertf -p parfile seems to work, but the output format is not accepted by plink!

I tried this command to test:

 plink1 --no-web --file panel1-PED.ped --make-bed --out panel1-BED

And it failed like this:

@----------------------------------------------------------@
|        PLINK!       |     v1.07      |   10/Aug/2009     |
|----------------------------------------------------------|
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|        http://pngu.mgh.harvard.edu/purcell/plink/        |
@----------------------------------------------------------@

Skipping web check... [ --noweb ]
Writing this text to log file [ panel1-BED.log ]
Analysis started: Fri Oct 13 11:51:03 2017

Options in effect:
        --noweb
        --ped panel1-PED.ped
        --map panel1-PED.map
        --make-bed
        --out panel1-BED


ERROR: Problem with MAP file line:
1  Affx-4964829     0.013491      1349123 A G

So the map file is incorrectly formatted, it has these 2 extra unwanted columns at the end.

My question is: why doesn't convertf output a correct map format? And is it safe to remove these two last columns using awk or sed (I did it, and plink seemed to make the conversion)?

SNP plink • 4.5k views
ADD COMMENT
0
Entering edit mode

The convertf from AdmixTools is really not working as it should. For example, the .fam file produced by using "PACKEDPED" output does not contain any population information anymore. The first column (family IDs) is just a row number... I am gonna try with Eigensoft.

ADD REPLY
0
Entering edit mode

Alright, convertf from Eigentools is doing the same

ADD REPLY
0
Entering edit mode

I've met the same problem. Did you manage to solve it?

ADD REPLY
0
Entering edit mode

Hi guys, I've recently found this script, which does the opposite conversion (VCF > plink > admixtools), but you can probably explore its logic to get what you need. In particular it uses several awk commands to tweak the intermediates between vcftools and convertf.

ADD REPLY

Login before adding your answer.

Traffic: 1855 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6