I am running GWAS analyses with gemma that seem to work but when I open the centered relatedness matrix (.cXX), I do get a symmetric matrix but the main diagonal is not made of 1s ... why?
The manual p.11 part 3.3.1 shows an example of matrix with a main diagonal which is not made of 1s. I also tried to dig into the gemma code on Github but I did not find the piece of code used to calculate the relatedness matrix.
Commands and output
Plink and gemma commands to generate the matrix:
# Generate plink file from my VCF file (67 samples)
vcftools --gzvcf $vcf_file --plink --out $prefix_vcf
# Make binary plink files
plink --file $prefix_vcf --make-bed --out $prefix_vcf
# Generate centered relatedness matrix with gemma
gemma -bfile $prefix_vcf -gk 1 -o $prefix_vcf
Here the output log of gemma:
##
## GEMMA Version = 0.94
##
## Command Line Input = -bfile subset_67_accessions_wo_singletons_only_alt_allele -gk 1 -o subset_67_accessions_wo_singletons_only_alt_allele
##
## Summary Statistics:
## number of total individuals = 67
## number of analyzed individuals = 67
## number of covariates = 1
## number of phenotypes = 1
## number of total SNPs = 927314
## number of analyzed SNPs = 39678
##
## Computation Time:
## total computation time = 0.123833 min
## computation time break down:
## time on calculating relatedness matrix = 0.00733333 min
##
I get as expected a square matrix of order 67 (as my number of samples in the VCF file). Here are the first 5 rows and columns of the matrix:
0.5265889259 0.0918289748 0.01315201643 -0.02417258371 -0.00300349980
0.0918289748 0.4426259465 0.01751500729 -0.0196104412 -0.04093555612
0.01315201643 0.01751500729 0.494663962 0.01773204554 0.03579284737
-0.02417258371 -0.0196104412 0.01773204554 0.5622735808 0.05660374681
-0.003003499807 -0.04093555612 0.03579284737 0.05660374681 0.6064086438
There is symmetry but not a main diagonal of 1s while same individuals should be identical to themselves, shouldn't they?
NB: I posted a similar post on the gemma-discussion google group but still did not get an answer after a month.
I have not used GEMMA; however, I don't recall other relatedness metrics (that I have used) equaling 1 for when the same person is being compared to themselves. I believe it's technically a limitless scale that can be both positive and negative, and which may be based on positions that are both genotyped in one individual in your dataset and those that are only genotyped in others. Probably better to hear from the developers of the program, as they may be the only ones who know how it was coded.
Thanks for your comment. Looking around, I could find that usually the main diagonal is made of 1 or values above 1 if consanguinity. The figure 1 of the Bae et al 2014 is illustrating well what I expect from a kinship matrix. Since individuals compared to themselves are based on the same SNP call datasets, it seems unlikely to me that the variations from 1 are due to missing data.