Entering edit mode
6.4 years ago
David_emir
▴
500
Hi All,
I am planning to conduct differential gene expression analysis for lung cancer data set. I have RNAseq count data from two lung cancer subtypes (LUSC & LUAD). I have a matrix of
> dim(raw_reads_lusc_luad) # My raw counts data from RNAseq
[1] 56830 1035
> head(pheno_lusc_luad) # My Phenotype file
sample status
1 TCGA-22-4593-01A lusc
2 TCGA-21-1077-01A lusc
3 TCGA-85-6175-01A lusc
4 TCGA-77-A5G6-01A lusc
5 TCGA-22-5483-01A lusc
6 TCGA-21-1072-01A lusc
> levels(DESeq.ds_lusc_luad$status)
[1] "luad" "lusc"
my workflow is as follows
rownames(raw_reads_lusc_luad) = make.names(raw_reads_lusc_luad$gene_symbol, unique=TRUE)
raw_reads_lusc_luad <- raw_reads_lusc_luad[ , -c (1) ]
DESeq.ds_lusc_luad <- DESeqDataSetFromMatrix(countData= raw_reads_lusc_luad,
colData = pheno_lusc_luad ,
design = ~1)
DESeq.ds_lusc_luad <- DESeq.ds_lusc_luad [ rowSums (counts ( DESeq.ds_lusc_luad ) )>0 , ]
DESeq.ds_lusc_luad <- estimateSizeFactors ( DESeq.ds_lusc_luad )
counts.sf_normalized_lusc_luad <- counts (DESeq.ds_lusc_luad , normalized = TRUE )
log.norm.counts_lusc_luad <- log2(counts.sf_normalized_lusc_luad + 1)
DESeq.rlog_lusc_luad <- varianceStabilizingTransformation(DESeq.ds_lusc_luad,
blind=FALSE)
> str(colData(DESeq.ds_lusc_luad)$status)
Factor w/ 2 levels "luad","lusc": 2 2 2 2 2 2 2 2 2 2 ...
Then I have applied the standard analysis process.
colData(DESeq.ds_lusc_luad )$status <- relevel(colData(DESeq.ds_lusc_luad)$status , "lusc","luad")
DESeq.ds_lusc_luad <- DESeq(DESeq.ds_lusc_luad)
ERROR is :
> DESeq.ds_lusc_luad <- DESeq(DESeq.ds_lusc_luad)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 10204 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing
Warning message:
**In DESeq(DESeq.ds_lusc_luad) :
the design is ~ 1 (just an intercept). is this intended?**
> resultsNames(DESeq.ds_lusc_luad)
[1] "Intercept"
> resultsNames(DESeq.ds_lusc_luad)
[1] "Intercept"
When I checked the status name it shows Error
> resultsNames(DESeq.ds_lusc_luad) # to check the status name
character(0)
> DGE.results <- results (DESeq.ds_lusc_luad , contrast=c("status", "lusc", "luad"), independentFiltering = TRUE , alpha = 0.05);
Error in results(DESeq.ds_lusc_luad, contrast = c("status", "lusc", "luad"), :
couldn't find results. you should first run DESeq()
Its throwing error in contrast fit. I am not able to understand this. Please let me know where I am missing. thanks in advance.
Regards,
Dav,
Session Info
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DESeq2_1.16.1 SummarizedExperiment_1.6.5
[3] DelayedArray_0.2.7 matrixStats_0.53.1
[5] Biobase_2.36.2 GenomicRanges_1.28.6
[7] GenomeInfoDb_1.12.3 IRanges_2.10.5
[9] S4Vectors_0.14.7 BiocGenerics_0.22.1
loaded via a namespace (and not attached):
[1] vsn_3.44.0 bit64_0.9-7 splines_3.5.0
[4] Formula_1.2-3 assertthat_0.2.0 affy_1.54.0
[7] latticeExtra_0.6-28 blob_1.1.1 GenomeInfoDbData_0.99.0
[10] yaml_2.1.19 pillar_1.2.3 RSQLite_2.1.1
[13] backports_1.1.2 lattice_0.20-35 limma_3.32.10
[16] glue_1.2.0 digest_0.6.15 RColorBrewer_1.1-2
[19] XVector_0.16.0 checkmate_1.8.5 colorspace_1.3-2
[22] preprocessCore_1.38.1 htmltools_0.3.6 Matrix_1.2-14
[25] plyr_1.8.4 XML_3.98-1.11 pkgconfig_2.0.1
[28] genefilter_1.58.1 zlibbioc_1.22.0 purrr_0.2.5
[31] xtable_1.8-2 scales_0.5.0 affyio_1.46.0
[34] BiocParallel_1.10.1 htmlTable_1.12 tibble_1.4.2
[37] annotate_1.54.0 ggplot2_3.0.0 nnet_7.3-12
[40] lazyeval_0.2.1 survival_2.42-4 magrittr_1.5
[43] memoise_1.1.0 foreign_0.8-70 BiocInstaller_1.26.1
[46] tools_3.5.0 data.table_1.11.4 stringr_1.3.1
[49] locfit_1.5-9.1 munsell_0.5.0 cluster_2.0.7-1
[52] AnnotationDbi_1.38.2 bindrcpp_0.2.2 compiler_3.5.0
[55] rlang_0.2.1 grid_3.5.0 RCurl_1.95-4.10
[58] rstudioapi_0.7 htmlwidgets_1.2 bitops_1.0-6
[61] base64enc_0.1-3 gtable_0.2.0 DBI_1.0.0
[64] R6_2.2.2 gridExtra_2.3 knitr_1.20
[67] dplyr_0.7.6 bit_1.1-14 bindr_0.1.1
[70] Hmisc_4.1-1 stringi_1.2.3 Rcpp_0.12.17
[73] geneplotter_1.54.0 rpart_4.1-13 acepack_1.4.1
[76] tidyselect_0.2.4
It says WARNING the design is ~ 1 (just an intercept). is this intended?**
Yes, I did notice that but couldn't understand, do you have any idea whats going wrong here?
That's happened because you told it to happen here:
DESeq.ds_lusc_luad <- DESeqDataSetFromMatrix(countData= raw_reads_lusc_luad, colData = pheno_lusc_luad , design = ~1)
tcga lung is a pretty difficult dataset.
Thanks, but what the issue? how to get this solved? O God got it! will try with ~status ! i am a such an idiot!