Question

GEMMA GWAS Analysis: Covariate Formatting and Encoding Best Practices

0

Entering edit mode

6 months ago

Patrick • 0

Dear all,

I'm conducting a genome-wide association study (GWAS) using GEMMA on data from three pig breeds, focusing on their meat traits. As a newcomer to GWAS analysis, I'd appreciate guidance on using covariates in my analysis.

I've completed variant calling, genotype imputation, and quality control on the data. I've also prepared my phenotype data (focusing on one trait) and calculated the kinship matrix using GEMMA.

For the analysis, I plan to use a univariate linear mixed model. I'd like to include breed, location, and month as covariates. However, I'm unsure about the correct approach.

Here's a sample section of my covariate file:

1s  Breed   Loc.    Month
1   1   1   1
1   1   1   1
1   2   2   2
1   3   1   1
1   1   2   3
1   3   1   1
1   2   1   2
1   3   2   1
1   1   2   4
1   3   2   5
......

The first column is the intercept (1s). I've added headers for clarity.

My concerns are:

Have I formatted the covariate file correctly?
By encoding breeds as 1, 2, and 3, am I implying a continuous relationship among them? Should I use one-hot encoding instead?

Example of one-hot encoding for breed, location, and month:

Breed1  Breed2  Breed3  Loc1  Loc2  Month1  Month2  Month3  Month4  Month5
1       0       0       1     0     1       0       0       0       0
0       1       0       1     0     1       0       0       0       0
0       0       1       0     1     0       1       0       0       0
.....

Should I adopt this approach? Any guidance on using covariates in GWAS analysis using GEMMA and general tips would be greatly appreciated.

Thank you for your time and expertise.

GWAS GEMMA Formatting Covariate • 574 views

ADD COMMENT • link 6 months ago by Patrick • 0