Hello,
I am working on a big SNP data set from GBS with over 300 individuals from 34 populations. 34 populations compose three closely related species. I tried various assignment tests to see the pop structure but still wanted to see clustering pattern with different approach. Unfortunately, I am not an expert of tree building. For a starter, I am not sure neighbor joining will give me informative inference on relationships among species and populations. Also, there are many heterozygote individuals for many loci since I am using SNPdata set. Which software takes account for ambiguity codes if I do neighbor joining analysis with my SNP data?
Any kind of answers will be very much appreciated.
Thanks in advance.
Hi, It might worth running PCA first on your data. Random Forests are roustabout classifiers, and having reduced the number of your features (SNPs) to manageable size you can build a nice model.