Hi all,
I am having problem in diff. exon usage analysis of my dataset. I generated count data using protocol mentioned in DEXSeq's manual. My runs seem to be halted after some time when DEXSeq function is called. I have also tried manual steps which are being called in DEXSeq function to see where the problem is. And it seems that estimateDispersions()
has to o something with this problem. I am running my job on cluster with 16 cores (512GB) for 48 hours but no progress. The output remained the same as shown below without any changes after first two hours of the run. Below is the sample of my counts data, annotation file and script output produced after 48 hours. Can anybody help me resolving the issue or what I am doing wrong here?
COUNT FILE
ENSG00000000003:001 105
ENSG00000000003:002 145
ENSG00000000003:003 139
ENSG00000000003:004 136
ENSG00000000003:005 139
ENSG00000000003:006 17
ENSG00000000003:007 33
ENSG00000000003:008 46
ENSG00000000003:009 185
.
.
.
_ambiguous 9946
_ambiguous_readpair_position 0
_empty 9520711
_lowaqual 0
_notaligned 0
FLAT FILE:
chr1 dexseq_prepare_annotation.py aggregate_gene 29554 31109 . + . gene_id "ENSG00000243485"
chr1 dexseq_prepare_annotation.py exonic_part 29554 30039 . + . transcripts "ENST00000473358"; exonic_part_number "001"; gene_id "ENSG00000243485"
chr1 dexseq_prepare_annotation.py exonic_part 30267 30365 . + . transcripts "ENST00000469289"; exonic_part_number "002"; gene_id "ENSG00000243485"
chr1 dexseq_prepare_annotation.py exonic_part 30366 30503 . + . transcripts "ENST00000607096+ENST00000469289"; exonic_part_number "003"; gene_id "ENSG00000243485"
chr1 dexseq_prepare_annotation.py exonic_part 30504 30563 . + . transcripts "ENST00000469289"; exonic_part_number "004"; gene_id "ENSG00000243485"
chr1 dexseq_prepare_annotation.py exonic_part 30564 30667 . + . transcripts "ENST00000469289+ENST00000473358"; exonic_part_number "005"; gene_id "ENSG00000243485"
chr1 dexseq_prepare_annotation.py exonic_part 30976 31097 . + . transcripts "ENST00000469289+ENST00000473358"; exonic_part_number "006"; gene_id "ENSG00000243485"
chr1 dexseq_prepare_annotation.py exonic_part 31098 31109 . + . transcripts "ENST00000469289"; exonic_part_number "007"; gene_id "ENSG00000243485"
OUTPUT script.Rout
R version 3.2.1 (2015-06-18) -- "World-Famous Astronaut"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
Creating a generic function for 'nchar' from package 'base' in package 'S4Vectors'
[Previously saved workspace restored]
>
> #source("https://bioconductor.org/biocLite.R")
> #biocLite("DEXSeq")
> #pythonScriptsDir = system.file( "python_scripts", package="DEXSeq" )
> #list.files(pythonScriptsDir)
> #/Library/Frameworks/R.framework/Versions/3.2/Resources/library/DEXSeq/python_scripts
>
> library(DEXSeq)
Loading required package: BiocParallel
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following object is masked from 'package:stats':
xtabs
The following objects are masked from 'package:base':
anyDuplicated, append, as.data.frame, as.vector, cbind, colnames,
do.call, duplicated, eval, evalq, Filter, Find, get, intersect,
is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax,
pmax.int, pmin, pmin.int, Position, rank, rbind, Reduce, rep.int,
rownames, sapply, setdiff, sort, table, tapply, union, unique,
unlist, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: IRanges
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: DESeq2
Loading required package: Rcpp
Loading required package: RcppArmadillo
> inDir = "dexseq_counts"
> files_fc <- list.files(inDir, pattern = "*fc", full.names = TRUE) # ".tab$"
>
> #-----------------------------Creating meta-data-----------------------------------------------#
>
> sampleTable = data.frame(row.names = c("cer_fetal_cyto_063_4","cer_fetal_cyto_063_6","cer_fetal_cyto_063_8", "cer_fetal_nuc_064_4","cer_fetal_nuc_064_6","cer_fetal_nuc_064_8"),
+ condition = c("cyto", "cyto", "cyto", "nuc", "nuc", "nuc"),
+ individual = c( "1", "2", "3", "1", "2", "3"))
>
> #-------------------------Running DEXSeq---------------------------------------#
>
> BPPARAM = MulticoreParam(workers=16)
>
> fullmodel= ~ sample + exon + individual:exon + condition:exon
> reduced_model = ~ sample + exon + individual:exon
> flatfile = "Homo_sapiens.GRCh37.75_protein_lincRNA.DEXSeq.chr.gff"
>
> dxd_fc = DEXSeqDataSetFromHTSeq(files_fc,sampleData=sampleTable, design= fullmodel, flattenedfile = flatfile)
converting counts to integer mode
> dxr_fc = DEXSeq(dxd_fc, reducedModel = reduced_model,BPPARAM = BPPARAM, quiet = FALSE)
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
using supplied model matrix
Regards,
adnan
is it still running?
You might check to see if the job is getting killed automatically. The thing with clusters is that sometimes they have limits on how long a job can run. Perhaps this is just getting killed early and the notice of that just isn't getting written to a file. So do as Martombo suggested and check if it's still running and, if not, ask the cluster admin if the job is getting killed. If it is still running, you might be able to ask the admin to check and see if it's actually using the allocated cores or if it seems to just be sitting there and doing nothing (in which case he/she can just kill it).