Statistical test to test the relationship between genetic strucutre and climate variables
0
0
Entering edit mode
5.2 years ago
Hann ▴ 110

Hello,

I have done population structure analyses from K=2 to K10 for plant samples coming from West Africa, and based on bioclimate data (i.e., temperature, rainfalls, altitude )and linguistic information, I somehow observed a relationship between the structure and bioclimate variables and some structure correlate with linguistic groups.

I want to statistically prove the relationship between the genetic structure (from K=2 to K=10) and the climate variables and linguistic groups.

My question is how to do this kind of test? if I have 4 different variables (temperature, rainfalls, altitude, linguistic groups) and I want to statistically prove that each K (K=2 to K=10) correlate with one or more of these variables.

We hypothesized that climate variables or geographical variable (i.e., temperature, rainfall, altitude, latitude, and longitude), as well as ethnic or linguistic groups, would have significant effects on genetic structure. So we have 7 variable or factors that we want to test and see if one or more of these variables are the ones caused the observed structure (for each K, from K=2 to K=10.

I've looked up on how to investigate this and I came up with the following ideas:

1- An easy thing to test this is: by first adding the information of the genetic cluster for each individual (to which cluster that particular individual contribute the most) see this example of K=5:

q1  q2  q3  q4  q5  indv    country long    lat temp    rain    clust   alt ethnic  lingue
0.0311253   1.00E-04    1.00E-04    0.137796    0.830879    CM03390 Togo    0.566666667 9.96666666
27.99583333 1105    clust5  142 konkomba    Volta-Congo
1.00E-04    1.00E-04    1.00E-04    0.125601    0.874099    CM03396 Togo    0.916666667 10.03333333
27.85833333 1144    clust5  213 lamba   Volta-Congo
1.00E-04    0.165953    1.00E-04    1.00E-04    0.833747    CM03400 Togo    1.1 9.8 25.79166667 1289
clust5  450 losso   Volta-Congo
1.00E-04    0.0134303   1.00E-04    0.175039    0.811331    CM03403 Togo    1.2 9.816666667 26.1125
1264    clust5  408 losso   Volta-Congo
1.00E-04    1.00E-04    1.00E-04    0.186677    0.813023    CM03406 Togo    0.75    9.883333333 27.90833333
1142    clust5  171 gangan  Volta-Congo
1.00E-04    1.00E-04    1.00E-04    0.111694    0.888006    CM03409 Togo    0.833333333 10.01666667
27.68333333 1142    clust5  223 lamba   Volta-Congo
0.029663    1.00E-04    1.00E-04    0.166367    0.80377 CM03418 Togo    0.95    9.666666667 27.42083333
1252    clust5  219 losso   Volta-Congo
1.00E-04    1.00E-04    0.9996  1.00E-04    1.00E-04    CM03423 Togo    1.166666667 8.6 26.74166667
1190    clust3  294 lama    Volta-Congo
0.00520464  0.00718895  0.987406    1.00E-04    1.00E-04    CM03424 Togo    0.7 9.566666667 27.625
1227    clust3  202 konkomba    Volta-Congo
0.00123103  1.00E-04    0.997225    0.00134425  1.00E-04    CM03430 Togo    0.8 9.266666667 27.08333333
1343    clust3  281 bassari Atlantic
1.00E-04    1.00E-04    0.9996  1.00E-04    1.00E-04    CM03431 Togo    0.733333333 7.583333333
24.40833333 1569    clust3  544 akposso Volta-Congo
1.00E-04    1.00E-04    0.9996  1.00E-04    1.00E-04    CM03434 Togo    1.016666667 7.616666667 25.45
1403    clust3  418 akposso Volta-Congo
1.00E-04    1.00E-04    0.9996  1.00E-04    1.00E-04    CM03437 Togo    0.9 7.466666667 24.1708333
1580    clust3  579 akposso Volta-Congo
1.00E-04    1.00E-04    0.9996  1.00E-04    1.00E-04    CM03438 Togo    0.916666667 9.41666666
26.74583333 1344    clust3  328 kpele   Mande Western
1.00E-04    1.00E-04    0.9996  1.00E-04    1.00E-04    CM03439 Togo    0.85    6.566666667 26.88333333
1136    clust3  118 kpele   Mande Western
0.000485579 1.00E-04    1.00E-04    0.255801    0.743513    CM04487 Benin   1.383333333 10.5    27.1875
1117    clust5  383 ditamari    Volta-Congo
0.0479849   1.00E-04    1.00E-04    0.0777588   0.874056    CM04489 Benin   0.783333333 10.4    28.05
1098    clust5  184 niende  Volta-Congo
0.00804672  0.151608    0.0213265   1.00E-04    0.818919    CM04493 Benin   0.983333333 10.26666667
28.13333333 1082    clust5  203 ditamari    Volta-Congo
1.00E-04    0.872051    1.00E-04    1.00E-04    0.127649    CM05734 Mali    -7.033333333    13.61666667 27.15   662
clust2  329 bambara Manding West
0.00714138  0.820561    1.00E-04    0.0424508   0.129747    CM05736 Mali    -7.183333333    13.88333333
27.19583333 619 clust2  308 sarakole    Mande Western
1.00E-04    0.856973    1.00E-04    1.00E-04    0.142727    CM05737 Mali    -7.516666667    14.01666667 27.03333333
626 clust2  358 bambara Manding West

and then use regression analysis to create models which describe the effect of variation in predictor variables (in our case, the geographic, climatic, linguistic and ethnic variables) on the response variable (in our case the cluster information; clust1 to clust5 in this example K=5). With ANCOVA analysis we would be able to get a p-value to see which variable has a significant effect on the response variable.

I am not sure if this will be the ideal way. But I don't think it's far from what we want (prove the relationship between the structure and the predictor variables)

Any suggestions? :)

SNP population genetics R • 841 views
ADD COMMENT

Login before adding your answer.

Traffic: 2185 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6