PEER normalization after DESeq run
1
2
Entering edit mode
9.1 years ago
tonja.r ▴ 600

In PEER-paper they write:

When RNA-seq estimates are used for transcript abundance, we recommend using DESeq to estimate library sizes and variance-stabilize expression data sets.

And I got confused by DESeq2 and DESeq. In DESeq (varianceStabili... depends on the estimateDispersions):

cdsBlind = estimateDispersions( cds, method="blind" )
vsd = varianceStabilizingTransformation( cdsBlind )

In DESeq2:

vsd <- varianceStabilizingTransformation(dds)

I am getting the same results from varianceStabilizingTransformation() if I use DESeq functions as DESeq(), estimateSizeFactors(), estimateDispersions() before or not.

dds = DESeqDataSetFromMatrix(countData=histone_m, colData=Design, design=~condition)
pik = DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing

vst=varianceStabilizingTransformation(pik)
head(assay(vst))
      WEN1     WEN2     WNN1     WNN2
1 8.568387 8.206238 7.843942 8.086705
2 8.962147 8.688945 8.320699 8.750248
3 8.135942 7.859804 7.843942 7.878520
4 7.321410 6.863988 7.165757 6.908507
5 6.474284 6.379227 6.420262 6.351544
6 6.561254 6.468225 6.555806 6.861555

vst=varianceStabilizingTransformation(dds)
head(assay(vst))
      WEN1     WEN2     WNN1     WNN2
1 8.568387 8.206238 7.843942 8.086705
2 8.962147 8.688945 8.320699 8.750248
3 8.135942 7.859804 7.843942 7.878520
4 7.321410 6.863988 7.165757 6.908507
5 6.474284 6.379227 6.420262 6.351544
6 6.561254 6.468225 6.555806 6.861555

cds=estimateSizeFactors(dds)
cds = estimateDispersions( cds)
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates

vst=varianceStabilizingTransformation(cds)
head(assay(vst))
      WEN1     WEN2     WNN1     WNN2
1 8.568387 8.206238 7.843942 8.086705
2 8.962147 8.688945 8.320699 8.750248
3 8.135942 7.859804 7.843942 7.878520
4 7.321410 6.863988 7.165757 6.908507
5 6.474284 6.379227 6.420262 6.351544
6 6.561254 6.468225 6.555806 6.861555

So, it seems it does not matter if I use DESeq function before or not. Does it make sense? Also, once I estimate SizeFactors and varianceStabilizingTransformation, how should I plug it into PEER?

R • 4.7k views
ADD COMMENT
5
Entering edit mode
9.1 years ago

From my understanding, there is negligible difference between the Variance Stabilizing Transformation functions of DESeq and DESeq2. I would suggest you use DESeq2 because it is the most up-to-date version.

library(peer)

# assign vst data to variable called expr_set
expr_set = assay(vst)    

# the PEER tutorial recommends k == 25% of "number of samples"
k = ncol(expr_set) * .25

# create empty peer model
model = PEER()

# set number of "hidden factors" searched for
PEER_setNk(model,k)

# PEER ask NxG matrix, where N=samples and G=genes
PEER_setPhenoMean(model,as.matrix(t(expr_set)))

PEER_setAdd_mean(model, TRUE)

# These can be changed to the tolerance you desire
PEER_setTolerance(model, .001)
PEER_setVarTolerance(model, 0.0005)

# usually doesn't go over 200 iterations        
PEER_setNmax_iterations(model, 1000)

# run peer
PEER_update(model)
ADD COMMENT
0
Entering edit mode

Thank you for peer. However, I am still confused about estimated size factors. You did not include it into the peer model and as seen from my example above, it does not matter if I run estimateSizeFactors() or DESeq() before varianceStabilizingTransformation() or not , it will always give me the same result, meaning size factors are not considered.

ADD REPLY
0
Entering edit mode

Why do you want to use the "Estimated Size Factors" for PEER?

ADD REPLY
0
Entering edit mode

They state in the paper: When RNA-seq estimates are used for transcript abundance, we recommend using DESeq to estimate library sizes. I guess they are referring to the effective library size e.g. to the normalization of the counts in order to account for the sequencing depth. And it can be done by estimateSizeFactors and then one can obtain the normalized counts by o=counts(cds,normalized=TRUE). But in the downstream analysis of DESeq, sizeFactors are used intern to normalize the counts. And in fact I would like to account for the sequencing depth because my libraries are quite different.

ADD REPLY
1
Entering edit mode

The Bioconductor RNAseq Workflow explicitly states, "Sequencing depth correction is done automatically for the rlog method (and for varianceStabilizingTransformation)." My point being, VST already corrects for different sequencing depths between libraries, you do not need to account for it in PEER.

ADD REPLY
0
Entering edit mode

one question more: the corrected data (normalized counts) set (for downstream analysis) I can obtain by PEER_getResiduals(model) as stated in the example, can't I? They seem to be centered at zero (in the example, simple unsupervised_demo) and at 0.1 in my case.

ADD REPLY
0
Entering edit mode

Yeah, PEER_getResiduals() gives you the "corrected normalized counts," which is what I commonly use for downstream analysis. I believe this Nature Protocol might be helpful to you:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3398141/

ADD REPLY

Login before adding your answer.

Traffic: 1352 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6