Dear all,
I'm conducting a genome-wide association study (GWAS) using GEMMA on data from three pig breeds, focusing on their meat traits. As a newcomer to GWAS analysis, I'd appreciate guidance on using covariates in my analysis.
I've completed variant calling, genotype imputation, and quality control on the data. I've also prepared my phenotype data (focusing on one trait) and calculated the kinship matrix using GEMMA.
For the analysis, I plan to use a univariate linear mixed model. I'd like to include breed, location, and month as covariates. However, I'm unsure about the correct approach.
Here's a sample section of my covariate file:
1s Breed Loc. Month
1 1 1 1
1 1 1 1
1 2 2 2
1 3 1 1
1 1 2 3
1 3 1 1
1 2 1 2
1 3 2 1
1 1 2 4
1 3 2 5
......
The first column is the intercept (1s). I've added headers for clarity.
My concerns are:
- Have I formatted the covariate file correctly?
- By encoding breeds as 1, 2, and 3, am I implying a continuous relationship among them? Should I use one-hot encoding instead?
Example of one-hot encoding for breed, location, and month:
Breed1 Breed2 Breed3 Loc1 Loc2 Month1 Month2 Month3 Month4 Month5
1 0 0 1 0 1 0 0 0 0
0 1 0 1 0 1 0 0 0 0
0 0 1 0 1 0 1 0 0 0
.....
Should I adopt this approach? Any guidance on using covariates in GWAS analysis using GEMMA and general tips would be greatly appreciated.
Thank you for your time and expertise.