Quantile normalisation on RNAseq with substantial differences on sample size
0
0
Entering edit mode
11 months ago
glendich • 0

I tried vst and rlog from DESEq2 for my RNA seq data. But i suspect the largest group (condition 1 with 60 samples) has affected the variance from other groups (condition2 with 20 samples, condition 3 and 4 with only 6 samples) during normalisation. Should I do quantile normalisation prior to vst in this case? Or skip vst?

Thanks and happy new year!

vst quantro quantile-normalisation • 1.2k views
ADD COMMENT
1
Entering edit mode

You need to share way more information, for example, a PCA biplot would be very helpful. Imbalanced groups, in my experience, should not affect the normalisation process or transformation via variance stabilisation; however, extremely different groups based on expression profile would, e.g., brain versus skin tissue.

ADD REPLY
0
Entering edit mode

Hi, thank you for your response. I am working on metatranscriptomics data, hence the sample can varied a lot but i am not sure if the extense would be similar to brain vs skin tissues. Upon reexamining my PCA plot, I've identified an error in the clustering, and as a result, the observed clusters no longer appear to be influenced by sample size. The largest group now exhibits a more even spread between clusters.

Another consideration prompting my interest in quantile normalization is related to the results of the Shapiro-Wilk test. Prior to applying the variance stabilizing transformation (VST), the proportion of genes estimated to follow a normal distribution was less than 0.1 percent. However, after VST, this percentage significantly increased to 70%. I am curious to know if such a substantial increase is expected and whether it influences the interpretation of the data since my study is related to the non-linear expression of genes.

ADD REPLY
1
Entering edit mode

after VST, this percentage significantly increased to 70%. I am curious to know if such a substantial increase is expected

Yes, this is expected. your raw counts data does not follow a normal distribution, this is well known/established. The variance stabilized transform (vst) log transforms the counts. Log transformation results in a distribution that more closely approximates a normal distribution.

ADD REPLY

Login before adding your answer.

Traffic: 2362 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6