Hi,
Can someone please explain what heteroscedasticity means in RNA-Seq data?
Thanks!
Hi,
Can someone please explain what heteroscedasticity means in RNA-Seq data?
Thanks!
Heteroscedasticity reffers to the data where there is a relationship between the mean of a variable and the variance. One example of this is that as the mean grows, so does the variance, which is the case for any variable that is poisson or negative-bionomial distributed.
For many traditional statistical tests, we talk about them assuming a normal distribution in the data. However, the actual assumption is often more that the data must not be hetroscedastic - normally distributed data is not hetroscedastic, but there is also non-normally distributed data that is non-hetroscedastic.
However the per transcript counts form RNA-seq data is hetroscedastic. To see this you could take an RNA-seq experiment and calculate the average number of counts for each gene, and the variance for each gene and plot the two against each other, you will see that the variance increaess with the mean. This means that any statistical test that requires non-hetroscedastic data - such as the t-test - cannot be used with RNA-seq data.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.