Entering edit mode
11 months ago
glendich
•
0
I tried vst and rlog from DESEq2 for my RNA seq data. But i suspect the largest group (condition 1 with 60 samples) has affected the variance from other groups (condition2 with 20 samples, condition 3 and 4 with only 6 samples) during normalisation. Should I do quantile normalisation prior to vst in this case? Or skip vst?
Thanks and happy new year!
You need to share way more information, for example, a PCA biplot would be very helpful. Imbalanced groups, in my experience, should not affect the normalisation process or transformation via variance stabilisation; however, extremely different groups based on expression profile would, e.g., brain versus skin tissue.
Hi, thank you for your response. I am working on metatranscriptomics data, hence the sample can varied a lot but i am not sure if the extense would be similar to brain vs skin tissues. Upon reexamining my PCA plot, I've identified an error in the clustering, and as a result, the observed clusters no longer appear to be influenced by sample size. The largest group now exhibits a more even spread between clusters.
Another consideration prompting my interest in quantile normalization is related to the results of the Shapiro-Wilk test. Prior to applying the variance stabilizing transformation (VST), the proportion of genes estimated to follow a normal distribution was less than 0.1 percent. However, after VST, this percentage significantly increased to 70%. I am curious to know if such a substantial increase is expected and whether it influences the interpretation of the data since my study is related to the non-linear expression of genes.
Yes, this is expected. your raw counts data does not follow a normal distribution, this is well known/established. The variance stabilized transform (vst) log transforms the counts. Log transformation results in a distribution that more closely approximates a normal distribution.