Entering edit mode
7.7 years ago
lakhujanivijay
5.9k
Command
results_transcripts=stattest(bg_filtered, feature="transcript", covariate="treatment", adjustvars=c("variety"), getFC=TRUE, meas="FPKM")
Error message:
Error in solve.default(t(mod) %*% mod) : system is computationally singular: reciprocal condition number = 6.66115e-17
my mapping file:
pheno_data
ids treatment variety
1 112-C control 112
2 112-S stress 112
3 111-C control 111
4 111-S stress 111
Commands ran before running stattest
:
bg = ballgown(dataDir=data_directory, samplePattern='11', meas='all', pData=pheno_data)
library(genefilter)
bg_filtered=subset(bg, "rowVars(texpr(bg)) > 1", genomesubset=TRUE)
SessionInfo
sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] genefilter_1.56.0 ballgown_2.6.0
loaded via a namespace (and not attached):
[1] Rcpp_0.12.9 AnnotationDbi_1.36.1
[3] XVector_0.14.0 GenomicRanges_1.26.2
[5] BiocGenerics_0.20.0 splines_3.3.1
[7] zlibbioc_1.20.0 GenomicAlignments_1.10.0
[9] IRanges_2.8.1 BiocParallel_1.8.1
[11] xtable_1.8-2 lattice_0.20-34
[13] GenomeInfoDb_1.10.2 tools_3.3.1
[15] SummarizedExperiment_1.4.0 parallel_3.3.1
[17] grid_3.3.1 nlme_3.1-128
[19] Biobase_2.34.0 mgcv_1.8-12
[21] DBI_0.5-1 survival_2.40-1
[23] digest_0.6.12 Matrix_1.2-6
[25] sva_3.22.0 rtracklayer_1.34.1
[27] RColorBrewer_1.1-2 S4Vectors_0.12.1
[29] bitops_1.0-6 RCurl_1.95-4.8
[31] memoise_1.0.0 RSQLite_1.1-2
[33] limma_3.30.9 Biostrings_2.42.1
[35] Rsamtools_1.26.1 stats4_3.3.1
[37] XML_3.98-1.5 annotate_1.52.1
Troubleshooting from my side:
This is a known error (https://github.com/alyssafrazee/ballgown/issues/67), so here are the things I checked:
There are indeed some transcripts where the FPKM values are zero; e.g
checking for column1 and column2
any(texpr(bg_filtered)[,1]==0 & texpr(bg_filtered)[,2]==0)
[1] TRUE
checking for column3 and column4
any(texpr(bg_filtered)[,3]==0 & texpr(bg_filtered)[,4]==0)
[1] TRUE
checking for all the columns
any(texpr(bg_filtered)[,1]==0 & texpr(bg_filtered)[,2]==0 & texpr(bg_filtered)[,3]==0 & texpr(bg_filtered)[,4]==0)
[1] FALSE
* i.e. there are no rows where FPKM for all the samples is zero.
Is your design correct? By looking at the ids you use, I think it's supposed to be:
My bad! that was just a typo while posting on Biostars. Not a problem with the actual file I am using. Just corrected my post!
Would be nice to mention that you crossposted this to https://support.bioconductor.org/p/95379/
Cross-posting is discouraged, because as such two communities will spend time trying to help you.
In the hope that somebody would reply as early as possible, yes I did infact posted it on Bioconductor. Deleted now on bioconductor. Any insights to the problem?
I can understand why you would post it on both biostars and bioconductor support, but at least be honest and mention that.
My first test would be to simplify the design and drop the 'variety' column and see if the error is reproduced. I think your design doesn't allow estimation of 'influence' of the variety covariate on the expression levels since you don't have replicates.
You know that your number of replicates is rather small?
Ok, thank you! Next time, I will try not to cross post or at least will disclose that I did that in the very beginning.
Answering related to the problem, in fact I do not have any replicates (I know I should have at least 3 for my results to be statistically significant). But, I can ignore this at this point just for the understanding.
I will give it a try as you said to confirm that this is indeed a problem with the number of replicates.
Unfortunately, dropping it does help in a rather unusual way; i.e. somehow ballgown interpret rest of the 2 samples as replicates and works! However, as soon as I try the same by creating a mapping file for only 2 samples (1 control and 1 treated) and loading those 2 samples as a ballgown object, it fails at the same step saying that
At this point, I think that it is not possible to go on with ballgown without any replicates.
Just for understanding, what if I create pseudo-replicates by duplicating the same samples and try to import them. Technically speaking it should work but what could be the implications from statistical point of view?
You don't have replicates when you use this matrix:
Because your 4 real groups are, as judged by the program:
Hi, Did you solve it? I have two samples and I just duplicated the file because we should have at least two replicates. But, I am getting the same error. Can you show me?
HaroonPakistan Unfortunately, I could not resolve the issue. I ended up using another tool in this case.