Hello All,
Can somebody please tell me if this PCA result but a good result and which way recommended best to validate that?
Note: the PCA is based on around 20 features and the samples are around 100.
Thanks a lot
Hello All,
Can somebody please tell me if this PCA result but a good result and which way recommended best to validate that?
Note: the PCA is based on around 20 features and the samples are around 100.
Thanks a lot
We can see a clear separation with respect to the two components you are plotting but beyond that there is no information to provide any judgement. You need to provide additional information about what experiment you are working on and are these components representing the main effect you are trying to study.
@Devon and @Kevin, thanks for both. I am picking up genes randomly and most of them are not differentially expressed or at least not statistically significant in that term. So my point is that, perhaps among those 20 genes only 3 of them differentially expressed and make such out. Can this be significant? Plus, heatmap and clustering would be enough to prove this separation? and also how about if I involve a classification method such as SVM? even I already applied and accuracy and Kappa value is too high.
Picking up genes randomly does not sound scientific in this situation - why would you do that? Why not do PCA on the entire dataset?
Usually, people perform a differential expression analysis and then subset their original data matrix with the statistically significant genes. Clustering with heatmap generation may then be performed on the subset data matrix.
Two answers are here.
First, if a subset of gene gives me the same output as the entire dataset, why is it not useful and scientific with less effort and information, gives good and same result? what do you think?
Second, following what others generally do like DE analysis and heatmap is not mandatory and it prevents making new approaches, at least I believe.
Hey, well, in that case, you should be performing the random samplng many times, and then checking the reproducibility of the results. Another name for this is bootstrapping.
I do not 100% understand your second point. Clustering / heatmap can show to what degree a panel of genes can segregate, for example, cases and controls.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What question do you want to answer?
How to add images to a Biostars post
Actually I asked a question not trying to answer one :) thanks for the link too.