Hi, all
Dose anyone familiar with the use of goseq package here, I always get quick reply here, so I post my questions here, hope for some suggestions.
I am trying to use goseq for RNAseq data analysis of a non-native bacteria, I fetch the genelength data and GO term data by using ensembl Perl API. Differential expression was done by DESeq2.
After data run pwf of goseq, error occurred as following:
###
Error in if (hi <= low) { : missing value where TRUE/FALSE needed
It seems to be the problem of the names or ID problem, but I could not solved it till now.
Have any one met this before? How could I solved this problem?
Thanks!
My workflow were as following
## import the genes ID which I fetch from Ensembl
HZ254gene_data <- read.delim("E:/HZ_GO_enrichment/HZ254gene_data.txt", quote="")
colnames(HZ254gene_data)[1]="locus_tag"
## get assayed gene vector
assayed.genes=HZ254gene_data$locus_tag
assayed.genes1=as.vector(assayed.genes)
is.vector(assayed.genes1)
[1] TRUE
head (assayed.genes1)
[1] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" "Mtc_0005" "Mtc_0006"
str(assayed.genes1)
chr [1:2512] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" "Mtc_0005" ...
## get gene length data for pwf bias.data
length=HZ254gene_data$length
is.vector(length)
[1] TRUE
names(length)=assayed.genes1
head(length)
Mtc_0001 Mtc_0002 Mtc_0003 Mtc_0004 Mtc_0005 Mtc_0006
768 654 1899 1770 351 654
#
str(length)
Named int [1:2512] 768 654 1899 1770 351 654 903 72 966 1071 ...
- attr(*, "names")= chr [1:2512] "Mtc_0001" "Mtc_0002" "Mtc_0003" "Mtc_0004" ...
##import results from DESeq2 and get the DEgenes
HZ254DifList_Mer <- read.delim("E:/HZ_GO_enrichment/HZ254DifList_Mer.txt")
de.genes1=as.integer(HZ254DifList_Mer$padj<0.05)
##get the genes for pwf
gene.vector=as.integer(de.genes1)
names(gene.vector)=assayed.genes1
head(gene.vector)
##import the GO_gene association table from ensembl API
HZ254gene_go <- read.delim("E:/HZ_GO_enrichment/HZ254gene_go.txt", quote="")
go.terms=HZ254gene_go
## run pwf,
pwf=nullp(gene.vector,bias.data=length)
##got errors
Error in if (hi <= low) { : missing value where TRUE/FALSE needed
sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] goseq_1.14.0 geneLenDataBase_0.99.12 BiasedUrn_1.06.1
[4] BiocInstaller_1.12.0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.24.0 Biobase_2.22.0 BiocGenerics_0.8.0
[4] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
[7] BSgenome_1.30.0 DBI_0.2-7 GenomicFeatures_1.14.2
[10] GenomicRanges_1.14.4 grid_3.0.2 IRanges_1.20.6
[13] lattice_0.20-24 Matrix_1.1-2 mgcv_1.7-28
[16] nlme_3.1-113 parallel_3.0.2 RCurl_1.95-4.1
[19] Rsamtools_1.14.3 RSQLite_0.11.4 rtracklayer_1.22.3
[22] stats4_3.0.2 tools_3.0.2 XML_3.98-1.1
[25] XVector_0.2.0 zlibbioc_1.8.0
Hi, liupfskygre, It's hard to know exactly what the issue is without a reproducible example, but perhaps the assignment of values to variable names, which are also base R functions, could be a problem. Try replacing
length=HZ254gene_data$length
withlgth=HZ254gene_data$length
. If thenullp
function callslength
(the function), your use of this name as a variable might be a conflict.thanks for your reminding! I recheck the data and found where the problems is, there were some genes in the dataset that do not have expression data, which were assigned as NA!