I am New in bioinformatices my question is In a PCA of 20 RNAseq samples, if PC1 accounts for 80% of the variability and PC2 accounts for 15% of the variability, then PC3 must account for the remaining 5% of the variability. Is that correct
I am New in bioinformatices my question is In a PCA of 20 RNAseq samples, if PC1 accounts for 80% of the variability and PC2 accounts for 15% of the variability, then PC3 must account for the remaining 5% of the variability. Is that correct
Nope, there can be as many PCs as there are SNPs, but we typically only calculate a few. In total the PCs will account for 100% of the variability, but if the first few account in total for say 99% of the variability in the data, then there's rarely much point to continuing.
See if your PCA program will produce a scree plot for you. Once you understand those it will make sense :)
I'd also like to add that if 80% of variance in your 20-sample RNA-seq PCA is explained by PC1, there's either something very wrong with your samples, or with the way you are analyzing your data. Make sure they are log-transformed and normalized - ideally with something like vst or rlog transformation from DESeq2
If the phenotype is strong like a knockout vs a wild type of a major regulator such as a master transcription factor, I definitely have seen samples with such high PC1 %. Depends of course on the sample type, cell line or primary etc. But yeah I agree that with 20 samples it is at least worth noting, and OP should make sure things are correctly processed. @OP, what kind of data are this, so species, treatment, cell type etc...
From https://chipster.csc.fi/manual/deseq2-transform.html
Both variance stabilizing transformation (VST) and regularized log transformation (rlog) aim to remove the dependence of the variance on the mean. In particular, genes with low expression level and therefore low read counts tend to have high variance, which is not removed efficiently by the ordinary logarithmic transformation. VST and rlog remove the experiment-wide trend of variance over mean calculated by the DESeq2 method. This dispersion calculation does not take into account the group information, and the transformation is therefore said to be blind.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
no it's not. you can (and certainly) have more than 3 PCs in your analysis. Could you put how your generate your PC data please ?
thank you so much now it is clear to me