I am performing differential gene expression analysis between to condition of temperature (A and B). I have the matrix of count for all my genes and the 5 replicates for each condition.
And I don't know what to think about it for the continuation of the analysis.
However when I plot the PCA keeping only differentially expressed genes we get better results. But for my part, I think it's a bit biased since it's obvious that genes will cluster between the 2 conditions since we have selected differentially expressed genes ... (whatever the input data would have been)
I have no obvious explanation that could explain this apart from randomness or the fact that the change of temperature has little impact on the transcriptional response, or handling mistakes ...
So I am wondering now if it is relevant and worth the effort to pursue a long analysis of differentially expressed genes.
For those replicates that are very different from each other, you may find it helpful to plot simple X-Y scatter plots. This will tell you if the variance is happening for all ranges of read counts, or only for low-read count genes, or only high-read count genes, etc...
If the replicate scatter plots are "all over the place", I would then dig into if you think this is experimental error, or if this is actual variation due to biology.
gene read counts between any two samples, X and Y. This will give you a visual sense of the differences between any two samples, beyond your initial PCA plot.
Ok thanks for your answer so I have a new PCA plot with all the names of samples (and the batch number after the "B" I have plot expression (kallisto raw count) B2_1 vs B1_12 and B4_3 vs B4_1
In the both cases the variance seems to be partially explained by low count transcripts but also with a more significant part of ofther transcripts in the second case ...
I have also investigate which transcrips are mostly driving PC1 and PC2 but nothing interesting has popped out so far ..
Ok thanks for your answers so I have a new PCA plot with all the names of samples (and the batch number after the "B" I have plot expression (kallisto raw count) B2_1 vs B1_12 and B4_3 vs B4_1
In the both cases the variance seems to be partially explained by low count transcripts but also with a more significant part of ofther transcripts in the second case ...
I have also investigate which transcrips are mostly driving PC1 and PC2 but nothing interesting has popped out so far ..
I have no obvious explanation that could explain this apart from randomness or the fact that the change of temperature has little impact on the transcriptional response, or handling mistakes ...
So I am wondering now if it is relevant and worth the effort to pursue a long analysis of differentially expressed genes.
Just because the first 2 PCs don't split your samples doesn't mean there are no significant differences. You just might not find many.