Can this PCA be a highly good result?
1
0
Entering edit mode
5.7 years ago
fernardo ▴ 180

Hello All,

Can somebody please tell me if this PCA result but a good result and which way recommended best to validate that?

Note: the PCA is based on around 20 features and the samples are around 100.

enter image description here

Thanks a lot

machine learning PCA RNA-seq NGS • 2.3k views
ADD COMMENT
1
Entering edit mode

What question do you want to answer?

How to add images to a Biostars post

ADD REPLY
0
Entering edit mode

Actually I asked a question not trying to answer one :) thanks for the link too.

ADD REPLY
1
Entering edit mode
5.7 years ago
GenoMax 147k

We can see a clear separation with respect to the two components you are plotting but beyond that there is no information to provide any judgement. You need to provide additional information about what experiment you are working on and are these components representing the main effect you are trying to study.

ADD COMMENT
0
Entering edit mode

Thanks. The study is from two conditions (disease vs normal).

ADD REPLY
1
Entering edit mode

Then it looks like you have a clear difference between them.

ADD REPLY
0
Entering edit mode

You are just doing PCA using the differentially expressed genes, right? - 20 genes? You may also want to show the separation in a cluster dendrogram and heatmap.

ADD REPLY
0
Entering edit mode

@Devon and @Kevin, thanks for both. I am picking up genes randomly and most of them are not differentially expressed or at least not statistically significant in that term. So my point is that, perhaps among those 20 genes only 3 of them differentially expressed and make such out. Can this be significant? Plus, heatmap and clustering would be enough to prove this separation? and also how about if I involve a classification method such as SVM? even I already applied and accuracy and Kappa value is too high.

ADD REPLY
0
Entering edit mode

Picking up genes randomly does not sound scientific in this situation - why would you do that? Why not do PCA on the entire dataset?

Usually, people perform a differential expression analysis and then subset their original data matrix with the statistically significant genes. Clustering with heatmap generation may then be performed on the subset data matrix.

ADD REPLY
0
Entering edit mode

Two answers are here.

First, if a subset of gene gives me the same output as the entire dataset, why is it not useful and scientific with less effort and information, gives good and same result? what do you think?

Second, following what others generally do like DE analysis and heatmap is not mandatory and it prevents making new approaches, at least I believe.

ADD REPLY
0
Entering edit mode

Hey, well, in that case, you should be performing the random samplng many times, and then checking the reproducibility of the results. Another name for this is bootstrapping.

I do not 100% understand your second point. Clustering / heatmap can show to what degree a panel of genes can segregate, for example, cases and controls.

ADD REPLY
0
Entering edit mode

Yes, exactly, I do random sampling / bootstrapping.

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6