One notion of cluster is a tightly associated group where all members are similar to all other members. Another notion of cluster is a group where there is a continuous path of small transitions from any one member to any other member. Both are valid ways of looking at clusters. The first notion can be more helpful when you are fishing for patterns since there is a high amount of signal in the group. It tends to be more stable. Single linkage is going to be sensitive to certain kinds of outliers that may serve as 'missing links' between two groups causing the algorithm to merge those groups.
If Single linkage is making more sense for your data, it would imply that the local structure of various parts of your data space are more important. Single linkage decides to merge clusters based on whether they have any members that are close to each other. Ward is merging based on the within cluster variance meaning how similar all the members in the cluster are before and after you merge clusters. So both linkages are going to yield very different things in certain situations. Imagine a dataset where there are several mostly distinct clusters which smush together slightly at the edges.
I spent several years working on different clustering techniques and my takeaway message is keep it simple. Use measures of distance that are not too complicated or that are commonly used in your field. My reason is simple, the result is easier to interpret and intelligible to a larger group of people. Use relatively simple techniques like hierarchical clustering or K-means. Use more than one method. Try to use more robust results, that show up in more than one approach.
A clustering job is a win if you have fewer groups than you started with. So in theory, if your raw data is 1000 and you have clustered it into 10 groups, 10 is still less complexity to deal with than 1000. I tend to favor clustering that optimizes for members that are all very similar to each other. Thus, downstream statistical analysis of the clusters is likely to yield simple models for each group.
In general, it is not surprising that we can't throw a mass of data of unknown structure into a procedure with few assumptions and consistently get biologically meaningful results. After all, the clustering procedure is not a biologist. So, the person doing the clustering must look for what biology he or she can find.
Clustering is indeed an art.
Dear APJ, this is almost identical to the question you asked before. I strongly agree with Steve and suggest you study the resources he mentions. Sometimes it might be hard to accept that there is no absolute best method and all methods have their merits and applications, but bear in mind that these are unsupervised methods and that there is no gold standard, like for example in supervised classification and machine learning where one has, ideally, well annotated data and can therefore objectively be evaluated yielding Specificity, Sensitivity, ROC curves etc, all of which is not possible for cluster analysis. The best way to deal with it is to take biological knowledge into account, as you have already attempted.