Would you suggest using TPM or FPKM values for PCA and WGCNA?
Thanks,
Marion
Would you suggest using TPM or FPKM values for PCA and WGCNA?
Thanks,
Marion
Hi Marion,
For this purpose, I'd imagine you would not likely see much difference. However, there is literally no reason to prefer FPKM over TPM. If you're looking to perform some analysis where relative abundance is an appropriate measure, you should always favor TPM.
Thank you everyone for your help! I tried it both ways. The PCA from the FPKM values made the most sense and the plot was similar to previous work. When I used the TPM values the strong separation by PC1 that we had seen with FPKM and in previous analysis moved to PC2.
That's interesting (i.e. the shift). However, the reason to prefer TPM over FPKM is that FPKM has a (somewhat arbitrary) dependence on the mean expressed transcript length of a samples, while TPM does not. It's probably worth checking that the separation you see in PC component was is not an artifact of this technical detail. You can calculate the different scaling factors between your samples using a method such as presented here.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I don't know for WGCNA, but a PCA assumes normality so you'll have to (at least) take the log transformed values wether you choose TPM or FPKM.
Do you have a reference for this? I don't think PCA needs any assumption. If you have variables measured on different scales, like metres and kilograms, than it's advisable to scale and centre to remove dependency on the units of measure but this is not the case for gene expression.
Ok, you are right, this is not really an assumption. More of an advice to get meaningful results : gene expression has a heavily skewed distribution and PCA is quite sensitive to outliers, that is why I usually log transform expression data. For reference : http://www.bioconductor.org/help/workflows/rnaseqGene/#the-rlog-transformation