I need to calculate log2 fold change values for lot of different experimental conditions when compared to their corresponding controls. Just to mention, I am not going to use these for differential expression analysis but for some other downstream analysis like clustering and stuff. Traditionally in my field, counts are normalized by TPM method and then fold change values are calculated by log2(TPM_exp+1)-log2(TPM_control+1)
[using 1 or 0.5 as pseudo counts for log transformation]. In my case, I realized that TPM is not a good way to normalize the data as I have few samples with lot of reads mapping to only one or two genes [RNA composition bias]. DESeq2 median of ratios normalization seems to take care of that issue. So, I prefer using DESeq2 normalization. But I cannot use DESeq2 for getting log2 fold change values because I don't have replicates for some of the experimental conditions and DESeq2 needs replicates to estimate log2 fold change values. So, I want to manually calculate log2 fold change values from DESeq2 normalized counts. So, I am using log2(DESeq2norm_exp+0.5)-log2(DESeq2norm_control+0.5)
for calculating log2 fold change values. I am not sure whether it is a good idea or the choice of pseudo-count here is very critical. Any comments or help is really appreciated.
Thanks AT point for your reply. I don't want to use counts data for clustering because it still has some technical or study related bias. I believe by calculating log fold change values, I can get rid of the bias to some extent. I think it is worth trying to perform VST() on counts data because I believe it performs median ratio normalization before variance stabilization transformation. This also I believe takes care of the inflation of fold change values with smaller number of counts. Do you think mu understanding is correct?
I do not think that any stats magic is going to compensate for the lack of replication and presence of unwanted technical variation, the latter unless you can meaningfully regress that and in case the experimental design allows it. I personally would probably try to use the
lfcShrink()
method to correct the fold changes from DESeq2 (in fact that is a major point of the method, see vignette, e.g. with the "ashr" method), and go along with that if you really need fold changes. I do not know though how reliable shrunken fold change estimates are without replication. The problem with any non-standard method is that you lack ground truth to benchmark against. I would therefore stick with what DESeq2 provides out of the box, that is not always better (here I think it is), but at least established and automated.Cross-posted...
https://bioinformatics.stackexchange.com/questions/20273/manually-calculating-log2-fold-change-values-from-deseq2-normalized-counts
https://support.bioconductor.org/p/9148604/