I first obtained .tped and .tfam files from a .vcf genotype file for our GWAS population, using PLINK. I'm now trying to use the .tped, .tfam files to make a kinship matrix with EMMAX.
For some reason, I'm getting this error, which I'm not familiar with and I can't find any relevant discussion about this online.
Input: emmax-kin -v -s -d 10 [prefix for input .tped and .tfam]
The input files (obtained via PLINK) appear to be consistent with how .tped and .tfam files are supposed to look: https://www.cog-genomics.org/plink2/formats
Bottom 5 rows, first 12 columns of input .tped file:
scaffold_338 . 0 19212 0 0 0 0 0 0 0 0
scaffold_338 . 0 19274 0 0 0 0 0 0 0 0
scaffold_338 . 0 19312 0 0 0 0 0 0 0 0
scaffold_338 . 0 19426 0 0 0 0 0 0 T T
scaffold_338 . 0 19428 0 0 0 0 0 0 C C
Bottom 5 rows of input .tfam file:
852 1015268 0 0 0 -9
852 1015271 0 0 0 -9
852 1015274 0 0 0 -9
852 1015277 0 0 0 -9
852 1015280 0 0 0 -9
Output:
Reading TFAM file [my input file prefix].tfam ....
Reading TPED file [my input file prefix].tped ....
Unrecognized token C
Desired output: A .kinf file (kinship matrix)
I'm at a loss of how to address this problem, so help is greatly appreciated. Thanks for your time and help.
No experience with emmax-kin, but the source code pasted below suggests it might expect genotypes encoded as 0,1,2, but it encountered the the letter base code 'C' in your TPED. Is that making any sense?
I've looked at this part of the source code alone and in the broader context, and don't understand why it would want genotypes encoded as 0, 1 or 2 (or how this is possible) when a .tped file has G/A/T/K/0 for each.
Hope somebody can clarify...
When generating your .tped, did you use the PLINK --recode12 --output-missing-genotype 0 options? I'm going from the EMMAX web page:
https://genome.sph.umich.edu/wiki/EMMAX#Preparing_Input_Genotype_Files http://zzz.bwh.harvard.edu/plink/dataman.shtml#recode
--recode12 will recode the alleles as 1 and 2.
This solved the problem. Thank you!