I'm analyzing RNASeq data, and plotted this PCA, of PC3 and PC4.
Initially, I thought the samples were too mixed and didn't cluster. But after finishing the differential expression analysis of Blue samples vs Orange samples, I did end up with a few significant genes (51 for a padj<0.05, lfc threshold of 0). It's not many genes, compared to analysing another variable (which shows clustering along PC1), but it's some.
I was wondering if the clustering along PC4 (blue is slightly upwards, orange slightly downwards) is indeed because PC4 is explaining this variance.
What do you think? Am I looking too much into it? Is it wrong to go back to the PCA, and should I just stick to my 51 significant genes?
There is no clustering in this plot to my eye.
Why are you not plotting the components that capture more variance, specifically PC1 and PC2?
I did plot them, they cluster according to the shape, along PC1. I was just trying to see if they clustered along any PC according to the color.
I have no idea what PC2 is though.
Relative contributions of each gene that entered the analysis are calculated for each PC, you just have to go through them.
I suggest that you have to preprocess your data again. Because the samples are not clustered properly in your PCA plot.
How do you figure that? The OP stated they are clustered by shape, and that is true.
The green squares are not clustered properly in the plot