Hello friends I have RSEM normalized RNAseq data and I log2 transformed them to use in analysis. I want to get rid of low variance genes.
How should I calculate variance? Should I calculate Variance for RSEM data or it can be done for Log2 transformed data?
Thanks
May I ask why you want to do that and what the final analysis goal is? I am reasonably sure there is a way of achieving your goal with standard tools so you do not need to write any custom code and leave the nitty griddy statistical details to the tool you use.
I need to get rid of low variance genes(I calculate on excel). then I will use the remaining genes for differential expression analysis and the significant genes will be used for plotting heatmap.
For variance calculation, I m not sure if I use RSEM data or RSEM-log2 transformed data? Also, I think differential expression analysis needs to be done with normalized data(log2 transformed data). but DESeq2 or edgeR cannot do it with negative values. how can I deal with this?
Where have you read that you need to get rid of low variance genes? Filtering by variance is incompatible with all the empirical Bayes algorithms including limma, edgeR and DESeq2 because it introduces bias into the dispersion estimation step. Use instead the filtering recommended by those packages.