Entering edit mode
2.1 years ago
ta_awwad
▴
350
Hi everyone,
first, I am not expert in using DEXseq. I am trying to analyze DEU from two different datasets. I am just worried about the batch effect. I tried to use svaseq function as suggested by others to remove this effect but ended up with an error message:
> sampleAnno
file condition libType type
1 ES.1.txt ESC paired-end 1
2 ES.2.txt ESC paired-end 1
3 LV_rep1.txt LV paired-end 2
4 LV_rep2.txt LV paired-end 2
> dxd = DEXSeqDataSetFromHTSeq(
countfiles = sampleAnno$file,
sampleData = sampleAnno,
design= ~ sample + exon + condition:exon,
flattenedfile = gffFile)
> dxd = estimateSizeFactors( dxd )
> dxd <- dxd[rowMeans(counts(dxd))>1,]
> dat <- counts(dxd, normalized=TRUE)
> library(sva)
> mod1 <- model.matrix(~ sample + exon , colData(dxd))
> mod0 <- model.matrix(~ 1, colData(dxd))
> svseq <- svaseq(dat, mod1, mod0, n.sv=2, B=5)
> dxdsva = dxd
> dxdsva$SV1=svseq$sv[,1]
> dxdsva$SV2=svseq$sv[,2]
> design(dxdsva) <- ~ SV1 + SV2 + sample + exon + condition:exon
> dxdsva <- estimateDispersions( dxdsva )
Error: BiocParallel errors
1 remote errors, element index: 1
0 unevaluated and other errors
first remote error:
Error in estimateDispersionsGeneEst(x, maxit = maxit, quiet = quiet, modelMatrix = modelMatrix, : the number of samples and the number of model coefficients are equal,
i.e., there are no replicates to estimate the dispersion.
use an alternate design formula
sessionInfo( )
R version 4.2.0 (2022-04-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.2.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] sva_3.44.0 genefilter_1.78.0 mgcv_1.8-40 nlme_3.1-159
[5] forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 purrr_0.3.5
[9] readr_2.1.3 tidyr_1.2.1 tibble_3.1.8 ggplot2_3.3.6
[13] tidyverse_1.3.2 DEXSeq_1.42.0 RColorBrewer_1.1-3 AnnotationDbi_1.58.0
[17] DESeq2_1.36.0 SummarizedExperiment_1.26.1 GenomicRanges_1.48.0 GenomeInfoDb_1.32.4
[21] IRanges_2.30.1 S4Vectors_0.34.0 MatrixGenerics_1.8.1 matrixStats_0.62.0
[25] Biobase_2.56.0 BiocGenerics_0.42.0 BiocParallel_1.30.3
loaded via a namespace (and not attached):
[1] fs_1.5.2 bitops_1.0-7 lubridate_1.8.0 bit64_4.0.5 filelock_1.0.2
[6] progress_1.2.2 httr_1.4.4 tools_4.2.0 backports_1.4.1 utf8_1.2.2
[11] R6_2.5.1 DBI_1.1.3 colorspace_2.0-3 withr_2.5.0 tidyselect_1.1.2
[16] prettyunits_1.1.1 bit_4.0.4 curl_4.3.3 compiler_4.2.0 rvest_1.0.3
[21] cli_3.4.1 xml2_1.3.3 DelayedArray_0.22.0 scales_1.2.1 rappdirs_0.3.3
[26] digest_0.6.29 Rsamtools_2.12.0 XVector_0.36.0 pkgconfig_2.0.3 limma_3.52.4
[31] dbplyr_2.2.1 fastmap_1.1.0 readxl_1.4.1 rlang_1.0.6 rstudioapi_0.14
[36] RSQLite_2.2.18 generics_0.1.3 jsonlite_1.8.2 hwriter_1.3.2.1 vroom_1.6.0
[41] googlesheets4_1.0.1 RCurl_1.98-1.9 magrittr_2.0.3 GenomeInfoDbData_1.2.8 Matrix_1.5-1
[46] Rcpp_1.0.9 munsell_0.5.0 fansi_1.0.3 lifecycle_1.0.3 edgeR_3.38.4
[51] stringi_1.7.8 zlibbioc_1.42.0 BiocFileCache_2.4.0 grid_4.2.0 blob_1.2.3
[56] parallel_4.2.0 crayon_1.5.2 lattice_0.20-45 haven_2.5.1 Biostrings_2.64.1
[61] splines_4.2.0 annotate_1.74.0 hms_1.1.2 KEGGREST_1.36.3 locfit_1.5-9.6
[66] pillar_1.8.1 geneplotter_1.74.0 codetools_0.2-18 biomaRt_2.52.0 reprex_2.0.2
[71] XML_3.99-0.11 glue_1.6.2 modelr_0.1.9 tzdb_0.3.0 png_0.1-7
[76] vctrs_0.4.2 cellranger_1.1.0 gtable_0.3.1 assertthat_0.2.1 cachem_1.0.6
[81] xtable_1.8-4 broom_1.0.1 survival_3.4-0 googledrive_2.0.0 gargle_1.2.1
[86] memoise_2.0.1 statmod_1.4.37 ellipsis_0.3.2
any help?
many many thanks TA
This error from
DESeq2::estimateDispersions
is saying that there aren't any replicate samples in your design and so an average dispersion value for a given condition can't be calculated. If you Google the error message you will get some more information, some good examples include https://support.bioconductor.org/p/134505/ and https://support.bioconductor.org/p/89746/Based on
sampleAnno
it does look like you have replicates so my guess is thatdesign(dxdsva) <- ~ SV1 + SV2 + sample + exon + condition:exon
is not producing an appropriate model or perhaps the model design is confounded?The other thing to note is that you appear to have performed
estimateSizeFactors
before runningsva
. I believe the recommendation is to use raw counts data forsva
then run throughestimateSizeFactors
,estimateDispersions
, etc. with the new model design https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#using-sva-with-deseq2 (though I'm not sure if the recommendation is different for DEXSeq)