Hello,
I have done population structure analyses from K=2 to K10 for plant samples coming from West Africa, and based on bioclimate data (i.e., temperature, rainfalls, altitude )and linguistic information, I somehow observed a relationship between the structure and bioclimate variables and some structure correlate with linguistic groups.
I want to statistically prove the relationship between the genetic structure (from K=2 to K=10) and the climate variables and linguistic groups.
My question is how to do this kind of test? if I have 4 different variables (temperature, rainfalls, altitude, linguistic groups) and I want to statistically prove that each K (K=2 to K=10) correlate with one or more of these variables.
We hypothesized that climate variables or geographical variable (i.e., temperature, rainfall, altitude, latitude, and longitude), as well as ethnic or linguistic groups, would have significant effects on genetic structure. So we have 7 variable or factors that we want to test and see if one or more of these variables are the ones caused the observed structure (for each K, from K=2 to K=10.
I've looked up on how to investigate this and I came up with the following ideas:
1- An easy thing to test this is: by first adding the information of the genetic cluster for each individual (to which cluster that particular individual contribute the most) see this example of K=5:
q1 q2 q3 q4 q5 indv country long lat temp rain clust alt ethnic lingue
0.0311253 1.00E-04 1.00E-04 0.137796 0.830879 CM03390 Togo 0.566666667 9.96666666
27.99583333 1105 clust5 142 konkomba Volta-Congo
1.00E-04 1.00E-04 1.00E-04 0.125601 0.874099 CM03396 Togo 0.916666667 10.03333333
27.85833333 1144 clust5 213 lamba Volta-Congo
1.00E-04 0.165953 1.00E-04 1.00E-04 0.833747 CM03400 Togo 1.1 9.8 25.79166667 1289
clust5 450 losso Volta-Congo
1.00E-04 0.0134303 1.00E-04 0.175039 0.811331 CM03403 Togo 1.2 9.816666667 26.1125
1264 clust5 408 losso Volta-Congo
1.00E-04 1.00E-04 1.00E-04 0.186677 0.813023 CM03406 Togo 0.75 9.883333333 27.90833333
1142 clust5 171 gangan Volta-Congo
1.00E-04 1.00E-04 1.00E-04 0.111694 0.888006 CM03409 Togo 0.833333333 10.01666667
27.68333333 1142 clust5 223 lamba Volta-Congo
0.029663 1.00E-04 1.00E-04 0.166367 0.80377 CM03418 Togo 0.95 9.666666667 27.42083333
1252 clust5 219 losso Volta-Congo
1.00E-04 1.00E-04 0.9996 1.00E-04 1.00E-04 CM03423 Togo 1.166666667 8.6 26.74166667
1190 clust3 294 lama Volta-Congo
0.00520464 0.00718895 0.987406 1.00E-04 1.00E-04 CM03424 Togo 0.7 9.566666667 27.625
1227 clust3 202 konkomba Volta-Congo
0.00123103 1.00E-04 0.997225 0.00134425 1.00E-04 CM03430 Togo 0.8 9.266666667 27.08333333
1343 clust3 281 bassari Atlantic
1.00E-04 1.00E-04 0.9996 1.00E-04 1.00E-04 CM03431 Togo 0.733333333 7.583333333
24.40833333 1569 clust3 544 akposso Volta-Congo
1.00E-04 1.00E-04 0.9996 1.00E-04 1.00E-04 CM03434 Togo 1.016666667 7.616666667 25.45
1403 clust3 418 akposso Volta-Congo
1.00E-04 1.00E-04 0.9996 1.00E-04 1.00E-04 CM03437 Togo 0.9 7.466666667 24.1708333
1580 clust3 579 akposso Volta-Congo
1.00E-04 1.00E-04 0.9996 1.00E-04 1.00E-04 CM03438 Togo 0.916666667 9.41666666
26.74583333 1344 clust3 328 kpele Mande Western
1.00E-04 1.00E-04 0.9996 1.00E-04 1.00E-04 CM03439 Togo 0.85 6.566666667 26.88333333
1136 clust3 118 kpele Mande Western
0.000485579 1.00E-04 1.00E-04 0.255801 0.743513 CM04487 Benin 1.383333333 10.5 27.1875
1117 clust5 383 ditamari Volta-Congo
0.0479849 1.00E-04 1.00E-04 0.0777588 0.874056 CM04489 Benin 0.783333333 10.4 28.05
1098 clust5 184 niende Volta-Congo
0.00804672 0.151608 0.0213265 1.00E-04 0.818919 CM04493 Benin 0.983333333 10.26666667
28.13333333 1082 clust5 203 ditamari Volta-Congo
1.00E-04 0.872051 1.00E-04 1.00E-04 0.127649 CM05734 Mali -7.033333333 13.61666667 27.15 662
clust2 329 bambara Manding West
0.00714138 0.820561 1.00E-04 0.0424508 0.129747 CM05736 Mali -7.183333333 13.88333333
27.19583333 619 clust2 308 sarakole Mande Western
1.00E-04 0.856973 1.00E-04 1.00E-04 0.142727 CM05737 Mali -7.516666667 14.01666667 27.03333333
626 clust2 358 bambara Manding West
and then use regression analysis to create models which describe the effect of variation in predictor variables (in our case, the geographic, climatic, linguistic and ethnic variables) on the response variable (in our case the cluster information; clust1 to clust5 in this example K=5). With ANCOVA analysis we would be able to get a p-value to see which variable has a significant effect on the response variable.
I am not sure if this will be the ideal way. But I don't think it's far from what we want (prove the relationship between the structure and the predictor variables)
Any suggestions? :)