Dear all, I am trying to run DESeq2 to get the list of differential expressed miRNAs between 330 tumor and 42 normal samples. However when I run the following code:
> dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Condition)
> dds <- estimateSizeFactors(dds)
> nc <- counts(dds, normalized=TRUE)
> filter <- rowSums(nc >= 10) >= 2
> dds <- dds[filter,]
> dds$condition<-relevel(dds$Condition, ref="normal")
> design(dds) <- formula(~ Type + Condition)
> dds <- DESeq(dds)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
32 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest
-- replacing outliers and refitting for 124 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
29 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest.
I tried to fix this problem with using higher threshold in this code:
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Condition)
> keep <- rowSums((counts(dds)) >= 10) >=300
> dds <- dds[keep,]
> dds$Condition<-relevel(dds$Condition, ref="normal")
> design(dds) <- formula(~ Type + Condition)
> dds <- estimateSizeFactors(dds)
> dds <- estimateDispersions(dds)
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates
> dds <- nbinomWaldTest(dds, maxit=1000)
> res <- results(dds)
> write.table(res,"D:\\FromD\\EndocrinologyAndMetabolicDisorderIns\\GDC\\deseq2-21khordadWhiteUsingFilter10-300.txt", sep="\t")
> plotMA(res, ylim = c(-2,2))
> res <- results(dds, name="condition_Cancer_vs_normal")
However this time I get this error for the last line: Error: subscript contains invalid names.
Now my question is what is the best threshold that we have to choose when we are working on large number of samples.
I did not face with this kind of errors with smaller number of samples (55 samples).
I would appreciate any comment
Nazanin
You get the error immediately upon running the last line or elsewhere? What threshold are you asking about in your post?
Yes, I got the error when I use
I want to know what thresholds we are allowed to use in :
and in:
Any particular reason for choosing 10 and 300 in
I have seen several papers/tutorials using different strategy like rowSums>0 to filter low count genes, but I could never get the real reason.