I'm running EIG 7.2.1 and using convertf to make a geno file with the following parameters:
genotypename: merged.eigenstrat.34string.length.beded
snpname: merged.eigenstrat.34string.length.map
indivname: 56001801066929_WGZ.snp.fam
outputformat: PACKEDANCESTRYMAP
genotypeoutname: merged.eigenstrat.34string.length.geno
snpoutname: merged.eigenstrat.34string.length.snp
indivoutname: merged.eigenstrat.34string.length.ind
familynames: NO
This is only being done for chr22 as a tutorial for myself to generate an easy-to-use wrapper.
I have about 2,504 1000Genomes samples, with one additional unknown sample added in.
The original input is a VCF, which has been modified so that the ID
column has only strings < 35 characters long.
the bed file was generated thus:
~/Scripts/plink1.9/plink --vcf merged.eigenstrat.34string.length.vcf --make-bed --out merged.eigenstrat.34string.length
and the map file thus:
~/Scripts/plink1.9/plink --vcf merged.eigenstrat.34string.length.vcf --recode --out merged.eigenstrat.34string.length
The end of the fam file looks like:
Y035 NA19198 0 0 1 -9
Y010 NA18511 0 0 2 -9
Y035 NA19197 0 0 2 -9
Y014 NA18519 0 0 1 -9
56001801066929_WGZ 56001801066929_WGZ 0 0 0 -9
The added sample is the last line.
when I run convertf: bin/convertf -p convertf.params.txt
I get a repeat of the input parameters and then the following to STDOUT:
genetic distance set from physical distance
HG01889 ignored
HG01894 ignored
HG01958 ignored
HG02314 ignored
HG01989 ignored
...
NA18511 ignored
NA19197 ignored
NA18519 ignored
56001801066929_WGZ ignored
all individuals set ignore. Likely input problem (col 6)
resetting all individual...
genotype file processed
numvalidind: 2505 maxmiss: 2505001
packedancestrymap output
##end of convertf run
Why are all individuals being ignored?
How can I modify the input to get rid of the input problem
?
How can I modify the file so that all individuals are converted?
Furthermore, how can I specify which individuals are YRI, CEU, FIN, etc.?