Dexseq: Problem With Constructing An Exoncountset
1
1
Entering edit mode
11.1 years ago
hoangtv ▴ 30

Hi, Could anyone please help me to solve the problem with making an ExonCountSet? I am new to R, so I am struggling now. I counted the 6 SAM files from GSNAP output using dexseq_count.py by following DEXSeq manual, then I made sample table. Here is what I did:

>sampleTable <- data.frame(row.names = c( "E1", "E2", "E3","F1", "F2", "F3" ), countFile = c( "E1.count", "E2.count", "E3.count", "F1.counts","F2.count", "F3.count" ), condition = c( "E", "E", "E", "F", "F", "F" ))
>sampleTable
countFile condition
E1 E1.count E
E2 E2.count E
E3 E3.count E
F1 F1.counts F
F2 F2.count F
F3 F3.count F
>ecs <-read.HTSeqCounts(sampleTable$countFile,sampleTable,"protein_coding_flattened.gff")
Error in read.table(x, header = FALSE, stringsAsFactors = FALSE) : 
'file' must be a character string or connection*

Thank you very much Thanh

• 3.4k views
ADD COMMENT
1
Entering edit mode

If you type list.files(), do E1.count and the other files show up? Also, I suspect that F1.counts should be F1.count.

ADD REPLY
0
Entering edit mode

Hi dpryan79, Thank you very much for your quick reply. The F1.counts is just the typo in previous message, sorry about that. Just note that I installed HTseq , flattened the annotation gtf file and did the counting in a different machine and then move data over to another one to process in R. I typed list.files(). All files seem to show up:

list.files() [1] "CITATION" "DESCRIPTION"
[3] "DEXSeq note 11.11.13.odt" "DEXSeq_1.8.0.tar"
[5] "doc" "E1.count"
[7] "E2.count" "E3.count"
[9] "F1.count" "F2.count"
[11] "F3.count" "help"
[13] "html" "INDEX"
[15] "Meta" "NAMESPACE"
[17] "NEWS" "protein_coding_flattened.gff" [19] "python_scripts" "R"

ADD REPLY
1
Entering edit mode

What is the output of class(sampleTable$countFile)? The common cause of this is that it's not a character vector.

ADD REPLY
0
Entering edit mode

Just to keep everyone in the loop, Alejandro Reyes (one of the DEXSeq authors) saw this same thread over on seqanswers. The read.HTSeqCounts() function (and vignette) will get tweaked to avoid this error in the future. It's always a good sign with the authors of tools follow these sites and respond when there are issues!

ADD REPLY
1
Entering edit mode
11.1 years ago
Irsan ★ 7.8k

dpryan is right, the class of sampleTable$countFile is factor, it should be character. And F1.counts should be F1.count. Try this:

sampleTable <- data.frame(row.names = c( "E1", "E2", "E3","F1", "F2", "F3" ), countFile = c( "E1.count", "E2.count", "E3.count", "F1.count","F2.count", "F3.count" ), condition = c( "E", "E", "E", "F", "F", "F" ),stringsAsFactors=FALSE)

When you do

sapply(sampleTables,class)

You will see that both your columns are of type character now.

Proceed with

ecs <-read.HTSeqCounts(sampleTable$countFile,sampleTable,"protein_coding_flattened.gff")
ADD COMMENT

Login before adding your answer.

Traffic: 1608 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6