In PEER-paper they write:
When RNA-seq estimates are used for transcript abundance, we recommend using DESeq to estimate library sizes and variance-stabilize expression data sets.
And I got confused by DESeq2 and DESeq. In DESeq (varianceStabili... depends on the estimateDispersions):
cdsBlind = estimateDispersions( cds, method="blind" )
vsd = varianceStabilizingTransformation( cdsBlind )
In DESeq2:
vsd <- varianceStabilizingTransformation(dds)
I am getting the same results from varianceStabilizingTransformation()
if I use DESeq functions as DESeq()
, estimateSizeFactors()
, estimateDispersions()
before or not.
dds = DESeqDataSetFromMatrix(countData=histone_m, colData=Design, design=~condition)
pik = DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
vst=varianceStabilizingTransformation(pik)
head(assay(vst))
WEN1 WEN2 WNN1 WNN2
1 8.568387 8.206238 7.843942 8.086705
2 8.962147 8.688945 8.320699 8.750248
3 8.135942 7.859804 7.843942 7.878520
4 7.321410 6.863988 7.165757 6.908507
5 6.474284 6.379227 6.420262 6.351544
6 6.561254 6.468225 6.555806 6.861555
vst=varianceStabilizingTransformation(dds)
head(assay(vst))
WEN1 WEN2 WNN1 WNN2
1 8.568387 8.206238 7.843942 8.086705
2 8.962147 8.688945 8.320699 8.750248
3 8.135942 7.859804 7.843942 7.878520
4 7.321410 6.863988 7.165757 6.908507
5 6.474284 6.379227 6.420262 6.351544
6 6.561254 6.468225 6.555806 6.861555
cds=estimateSizeFactors(dds)
cds = estimateDispersions( cds)
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
vst=varianceStabilizingTransformation(cds)
head(assay(vst))
WEN1 WEN2 WNN1 WNN2
1 8.568387 8.206238 7.843942 8.086705
2 8.962147 8.688945 8.320699 8.750248
3 8.135942 7.859804 7.843942 7.878520
4 7.321410 6.863988 7.165757 6.908507
5 6.474284 6.379227 6.420262 6.351544
6 6.561254 6.468225 6.555806 6.861555
So, it seems it does not matter if I use DESeq function before or not. Does it make sense? Also, once I estimate SizeFactors and varianceStabilizingTransformation, how should I plug it into PEER?
Thank you for peer. However, I am still confused about estimated size factors. You did not include it into the peer model and as seen from my example above, it does not matter if I run
estimateSizeFactors()
orDESeq()
beforevarianceStabilizingTransformation()
or not , it will always give me the same result, meaning size factors are not considered.Why do you want to use the "Estimated Size Factors" for PEER?
They state in the paper: When RNA-seq estimates are used for transcript abundance, we recommend using DESeq to estimate library sizes. I guess they are referring to the effective library size e.g. to the normalization of the counts in order to account for the sequencing depth. And it can be done by
estimateSizeFactors
and then one can obtain the normalized counts byo=counts(cds,normalized=TRUE)
. But in the downstream analysis of DESeq, sizeFactors are used intern to normalize the counts. And in fact I would like to account for the sequencing depth because my libraries are quite different.The Bioconductor RNAseq Workflow explicitly states, "Sequencing depth correction is done automatically for the rlog method (and for varianceStabilizingTransformation)." My point being, VST already corrects for different sequencing depths between libraries, you do not need to account for it in PEER.
one question more: the corrected data (normalized counts) set (for downstream analysis) I can obtain by
PEER_getResiduals(model)
as stated in the example, can't I? They seem to be centered at zero (in the example, simple unsupervised_demo) and at 0.1 in my case.Yeah, PEER_getResiduals() gives you the "corrected normalized counts," which is what I commonly use for downstream analysis. I believe this Nature Protocol might be helpful to you:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3398141/