My question is little bit of biology / translation based but related to bioinformatics analysis. I have an expression dataset where sample number is limited no replicates in the experiment there are 4 cell lines one is primary and rest of the two are differentiating different lineage. I recognize the importance of replicates and limitations of such data and reliability. Now based on differences in expression in two sample analysis I selected around 1500 genes which showed a specified threshold cut off value. When I used Hierarchical clustering (Average linkage and Euclidean distance) it give me three clusters of samples which make sense from biological point of view. However there are several genes which are not perfectly discriminatory between two or three samples. Therefor I want to select a subset of genes which clearly discriminate three grps. From same clustering, when I looked at the clusters of genes it gave me 5 clusters when I cut these clusters I can get one cluster which perfectly discriminate between three grps. I again re-cluster that subset of data (one cluster of genes identified from 1500 genes) and it gave me meaningful results. Now I understand that I am not basing my interpretation on stat. I took that subset of data (one cluster) and used PCA plot and K means clustering which all mirror with same separation of three clusters. The question which I have - Is this a reasonable approach even though qualitative? Has any one aware of selection of genes from sub clusters like this? I tried to search but could not find publication per se.
Thanks for your help.