I have downloaded the "Affymetrix Human Origins Curated" dataset from David's Reich Lab, but am totally at loss to understand which format it is, and how I could convert it to something usable by plink
So far, I have downloaded the utility convertf
, from the AdmixTools package for example.
Based on the convertf README, I assumed that the .geno
, .snp
and .ind
files are in "PACKEDANCESTRYMAP" format.
I attempted to convert them to PED using the following "parfile" for convertf:
genotypename: panel1.geno
snpname: panel1.snp
indivname: panel1.ind
outputformat: PED
genotypeoutname: panel1-PED.ped
snpoutname: panel1-PED.map
indivoutname: panel1-PED.pedind
Then convertf -p parfile
seems to work, but the output format is not accepted by plink!
I tried this command to test:
plink1 --no-web --file panel1-PED.ped --make-bed --out panel1-BED
And it failed like this:
@----------------------------------------------------------@
| PLINK! | v1.07 | 10/Aug/2009 |
|----------------------------------------------------------|
| (C) 2009 Shaun Purcell, GNU General Public License, v2 |
|----------------------------------------------------------|
| For documentation, citation & bug-report instructions: |
| http://pngu.mgh.harvard.edu/purcell/plink/ |
@----------------------------------------------------------@
Skipping web check... [ --noweb ]
Writing this text to log file [ panel1-BED.log ]
Analysis started: Fri Oct 13 11:51:03 2017
Options in effect:
--noweb
--ped panel1-PED.ped
--map panel1-PED.map
--make-bed
--out panel1-BED
ERROR: Problem with MAP file line:
1 Affx-4964829 0.013491 1349123 A G
So the map file is incorrectly formatted, it has these 2 extra unwanted columns at the end.
My question is: why doesn't convertf
output a correct map format? And is it safe to remove these two last columns using awk
or sed
(I did it, and plink seemed to make the conversion)?
The convertf from AdmixTools is really not working as it should. For example, the
.fam
file produced by using "PACKEDPED" output does not contain any population information anymore. The first column (family IDs) is just a row number... I am gonna try with Eigensoft.Alright, convertf from Eigentools is doing the same
I've met the same problem. Did you manage to solve it?
Hi guys, I've recently found this script, which does the opposite conversion (VCF > plink > admixtools), but you can probably explore its logic to get what you need. In particular it uses several
awk
commands to tweak the intermediates betweenvcftools
andconvertf
.