Hello,
I'm new in bioinformatic and I would like to normalize HTseqcount data to do a survival analysis. How can I choose the best normalization ?
I tried to use DESeq2 with the median ratio method and normTransform but all medians are not align ...
DESeq_object <- estimateSizeFactors(DESeq_object)
counts_normalized <- counts(DESeq_object, normalized = TRUE)
boxplot(log2(counts_normalized + 1), main = "Counts normalized + log2 transformation")
I tried this too :
DESeq_object <- DESeqDataSetFromMatrix(countData = count,
colData = coldata,
design= ~ gender)
vst_object <- varianceStabilizingTransformation(DESeq_object)
boxplot(assay(vst_object))
How can I align all medians to have a good normalization ?
Thank you for your help !!
If you want equal distribution you would need to do something like quantile normalization, see for example the implementation in https://bioconductor.org/packages/release/bioc/html/preprocessCore.html
From my understanding that is quite a strong data manipulation though, why do you think that medians need to precisely align? If you have a heterogeneous sample population then one would probably expect quite some changes in expression profiles throughout the cohort, so medians (I guess) would not be expected to align, even after normalization. Why not just sticking with the well-tested vst? Maybe see whether prefiltering the data helps to get medians a bit closer, e.g.
dds[rowSums(counts(dds) > 10) > 3,]
before running vst.I don't know the normalization data for RNA-seq and a biologist advised me to have identical medians. How can I know if the median ratio method or vst or other method are best for the normalization ? vst don't use normalized data ?
Agree with ATpoint, you may use
assay(vst_object)
to do survival analysis. Alsovoom
function from thelimma
package can do the job. Here is a blogpost on RNA-seq survival analysis by usingvoom
function.Thank you for your help !
Cross-posted on Bioconductor, where user was already provided an answer: https://support.bioconductor.org/p/9136130/