I have sequenced bacterial mRNA samples from 2 different conditions using Illumina. The number of reads and quality is equivalent between the 2 samples but I have no replicates.
I have already normalized the 2 samples on ORF length and total number of reads. Is it good enough or should I add a normalization step ?
the normalization of some RNA-Seq (and more globally gene expression) data has to be chosen regarding the kind of analysis you want to perform.
If you have normalized for the read length, number of reads,... (I guess you used a RPKM measure) this is already a good first step.
If you want to do some non-parametric statistical analyses (meaning based on ranks of genes, not their absolute expression values) then you don't need any extra normalization step since the rank won't be affected.
If you want to use the absolute levels you still can do without normalization in some cases but, since most normalization procedures won't affect the distribution within your samples (it will only make the distribution of the two samples look more alike), it might be worth applying one.
There are many methods described and/or used to do some cross-samples normalization. A basic and "neutral" one is to scale the expression values of your two samples so that they can have a similar distribution (for example, a same median value). Some possibilities are to adjust the medians of the two distributions or scale the level of expression of a set of highly expressed housekeeping genes (that you can define based on your data or on previously described set of genes in the literature).
Median scaling is a very (the most?) basic normalization method you can use and it might indeed not be relevant if your distribution is not normal. Scaling based on housekeeping genes might be a good alternative. Nonetheless, with only two samples, it might be more accurate to use a set of previously described housekeeping genes since your statistical power will be very poor to detect genes displaying little variation between samples.
Thanks Philippe. I thought about the median normalization. But is it statistically relevant to use this approach knowing that we do not obtain a normal distribution. Anyway, normalizing on housekeeping genes is a good idea, I will investigate
Thanks Philippe. I thought about the median normalization. But is it statistically relevant to use this approach knowing that we do not obtain a normal distribution ?
Anyway, normalizing on housekeeping genes is a good idea, I will investigate
Median scaling is a very (the most?) basic normalization method you can use and it might indeed not be relevant if your distribution is not normal. Scaling based on housekeeping genes might be a good alternative. Nonetheless, with only two samples, it might be more accurate to use a set of previously described housekeeping genes since your statistical power will be very poor to detect genes displaying little variation between samples.
Thanks Philippe. I thought about the median normalization. But is it statistically relevant to use this approach knowing that we do not obtain a normal distribution. Anyway, normalizing on housekeeping genes is a good idea, I will investigate
Thanks Philippe. I thought about the median normalization. But is it statistically relevant to use this approach knowing that we do not obtain a normal distribution ? Anyway, normalizing on housekeeping genes is a good idea, I will investigate