I'm using Lositan with 734 SNP's for 72 individuals across a latitudinal cline, maybe structured in 2 main clusters and possibly in IBD.
If I use few iterations the envelope looks good (50K) (Fig1), but with more (>500K) looks terrible (Fig2), too ragged, and too many outliers. I've played with all parameters (#pops, force Fst, FDR,etc.) and keep getting the same,
Thanks for your quick reply. Yes my file has two populations, the sample size reported is 2 populations. I'm basing this in a preliminar structure and DACP analyses. However, I have played with the expected total pops quite a bit (2, 10, 20, 50, 100), and getting the same behaviour. Should I split the file in more "artificial" populations? I see how that might help, but then I guess the question is if the result will depend on how populations are assigned?
Some improvement when the original file is divided in 9 populations, see fig. However still some spikes in low He and Fst. This was using expected pop=20 and subsamplesize= 50, assumed Fst 0.06.
Greetings. I am trying to use LOGISTAN for my SNP dataset. I have tried several times but it seems does not run for my data. I have SNP data with VCF extension and I converted to Genpop format and run by program but it did not run!
I assume something wrong happened during file conversion or something else. Could you help me how can I proceed with this issue?
There is something that is not clear to me at this stage:
What value is lositan reporting for the sample size (is that 10?). This is different from the total number of simulated populations (which would start as 2 in your case). If I understand well, you have 76 samples. Are these approximately equally divided by both populations? Lositan should be reporting back a much bigger number for sample size (surely not 10 if you have 76 samples)
The effect could be explained by low sample size and possibly low number of populations.
This is, I am afraid due to sampling effects related to the number of populations, there are two alternatives here:
You think that you have 2 populations overall (i.e. you sampled 2 out of 2 in the wild), and in this case the CI that Lositan is computing is not trustable
You think that you have more populations overall. In this case the sampling issue disappears.
So, Lositan is not appropriate if you have 2 populations in the wild, so I would recommend not to use it. But, if you are sampling 2 populations out of many, then you can probably still use it. Of course, how many populations you have might be complex to estimate, but if you are sure you have more than ~10 wild pops, the results should not vary much.
Thanks for the detailed answer Tiago. I think the key point lands on the difference between populations and genetic clusters. It seems fair enough to use sampling populations in this case, but it would be handy to be able to test few (3 or less) genetic clusters to each other, because is at that level that most of the species structure, and someone may want to look for selection at that level.
So, are you splitting this in two populations? or more? What is the sample size reported when you load the data?
Hello Tiago,
Thanks for your quick reply. Yes my file has two populations, the sample size reported is 2 populations. I'm basing this in a preliminar structure and DACP analyses. However, I have played with the expected total pops quite a bit (2, 10, 20, 50, 100), and getting the same behaviour. Should I split the file in more "artificial" populations? I see how that might help, but then I guess the question is if the result will depend on how populations are assigned?
Cheers,
Just to be sure: on the bottom middle panel you have:
Mean Fst
Expected total pops
Mutation Model
Sample size.
Are you reporting the "Expected total pops" or the "sample size"?
Hello,
Mean Fst= 0.05
Expected total of pops, I have used different values and get same results= 2, 10, 40 and 100
Mutation model= Infinite sites
Sample size = 69 individual, 734 loci
Cheers
Some improvement when the original file is divided in 9 populations, see fig. However still some spikes in low He and Fst. This was using expected pop=20 and subsamplesize= 50, assumed Fst 0.06.
Fig: http://tinypic.com/r/2mrsnef/8
However, if I use default parameters --> expected pops = 9 and sub sample size = 10, I get many more "spikes"
Cheers
Dear all,
Greetings. I am trying to use LOGISTAN for my SNP dataset. I have tried several times but it seems does not run for my data. I have SNP data with VCF extension and I converted to Genpop format and run by program but it did not run!
I assume something wrong happened during file conversion or something else. Could you help me how can I proceed with this issue?
Thanks for your attention,
Amin