I am doing DAPC for SNP dataset using:
library(adegenet)
x<-input_file
x1<-as.data.frame(t(x))
gen<-as.genlight(x1)
grp<-find.clusters(gen,max.n.clust = 10)
dapc1<-dapc(gen,grp$grp)
scatter(dapc1)
But the scatter plot shows clusters without the individual samples represented as dots. I want to make a scatter plot which represents individual samples as dots in the clusters. I am not sure what is wrong or missing in my code. Thank you for the help!
What does the output look like? I don't understand what clusters shown without data points mean.
Some possibilities that come to mind: the scatter() function could have been redefined in your environment or your data points all fall into the cluster centres. Try using parameter pch = 19 and other customization parameters to make the points visible.
For results, I am getting the center boxes. I have found an example here similar to what I am getting. I think customization will work to modify the data points. But in my case, there are no data points shown.
If your plot is like the example you linked to, the issue seems to be that the data points all pile up onto the cluster centres and are masked by the the cluster label (i.e. the square with the cluster number). Check the next plot in the example to see what I mean.
This is exactly what I happened to me. The Labels were covering up the individuals. I thought I was looking my mind. thank you!
Hello Jenna, could you then solve the problem? I see at the link above (by Jean-Karim Heriche) there is a graph with the same problem, but I could not find the solution ANYWHERE!
If your data points pile up on top of each other, this means they have the same coordinates. The solution is to make the coordinates different either upstream by changing the way you process your data or at the plot level by adding a bit of noise to the coordinates. This is what the R jitter() function is for.
If points are covered by labels then remove the labels. It's generally a bad idea to put labels in a plot for exactly this reason, a legend is preferable or if that's not suitable, putting the labels outside the plot and using light lines to link them to points or areas in the plot.
Thank you very much Jean-Karim, the problem is that the points have not the same coordinates, neither the labels cover them: I know this because if I plot the data using a simple plot function
plot(dapc1$tab[,1:2], col=grp$grp, pch=c(grp$grp))
then I can clearly see all the points!
How do you suggest to proceed? Thank you again!!
So this is probably a different problem and belongs in its own post. Also probably not bioinformatics-specifc but an R programming question more suited to StackOverflow maybe. I would look into how you wrote your plotting code, like maybe plotting only the labels. Again I think plotting labels and the points is a bad idea. Convert the labels into a legend and color/shape the points accordingly.