Input problem with EIGENSTRAT
1
0
Entering edit mode
2.5 years ago
dec986 ▴ 380

I'm running EIG 7.2.1 and using convertf to make a geno file with the following parameters:

genotypename:   merged.eigenstrat.34string.length.beded
snpname:    merged.eigenstrat.34string.length.map
indivname:  56001801066929_WGZ.snp.fam
outputformat:   PACKEDANCESTRYMAP
genotypeoutname:    merged.eigenstrat.34string.length.geno
snpoutname: merged.eigenstrat.34string.length.snp
indivoutname:   merged.eigenstrat.34string.length.ind
familynames:    NO

This is only being done for chr22 as a tutorial for myself to generate an easy-to-use wrapper.

I have about 2,504 1000Genomes samples, with one additional unknown sample added in.

The original input is a VCF, which has been modified so that the ID column has only strings < 35 characters long.

the bed file was generated thus:

~/Scripts/plink1.9/plink --vcf merged.eigenstrat.34string.length.vcf --make-bed --out merged.eigenstrat.34string.length

and the map file thus:

~/Scripts/plink1.9/plink --vcf merged.eigenstrat.34string.length.vcf --recode --out merged.eigenstrat.34string.length

The end of the fam file looks like:

Y035    NA19198 0   0   1   -9
Y010    NA18511 0   0   2   -9
Y035    NA19197 0   0   2   -9
Y014    NA18519 0   0   1   -9
56001801066929_WGZ  56001801066929_WGZ  0   0   0   -9

The added sample is the last line. when I run convertf: bin/convertf -p convertf.params.txt

I get a repeat of the input parameters and then the following to STDOUT:

genetic distance set from physical distance
HG01889 ignored
HG01894 ignored
HG01958 ignored
HG02314 ignored
HG01989 ignored
...
NA18511 ignored
NA19197 ignored
NA18519 ignored
56001801066929_WGZ ignored
all individuals set ignore.  Likely input problem (col 6)
resetting all individual...
genotype file processed
numvalidind:   2505  maxmiss: 2505001
packedancestrymap output
##end of convertf run

Why are all individuals being ignored?

How can I modify the input to get rid of the input problem? How can I modify the file so that all individuals are converted? Furthermore, how can I specify which individuals are YRI, CEU, FIN, etc.?

eigenstrat • 1.1k views
ADD COMMENT
2
Entering edit mode
2.5 years ago
4galaxy77 2.9k

This is an annoying thing about using plink to make eigenstrat input -9 in the last column of the fam file is for missing samples. Change it to 1 for all samples.

ADD COMMENT

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6