Question

DESeq DataSet from HTSeqCount Error

0

Entering edit mode

4.7 years ago

mahejabeen.nidhi ▴ 20

As my lab does not a lot of computational power, I had used Galaxy for alignment and HTSeq. To produce better graphs for downstream analysis, I had to switch to RStudio.

I am using the six samples from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132732 There are two treatment conditions IL4 and M0.

I also got the DESeq2 results using galaxy, but I thought that doing the DESeq2 from the HTSeq would have consistent formatting required later on.

The HTSeq in their tabular datatype has a header, but when I convert it to .csv file, it no longer has a header. So when I downloaded the csv file, I added the heading manually.

Below is the sorry excuse of a code I attempted for DESeq2. I think, rather I know, that the sampleCondition is where I went very wrong, but I don't know how to correct it.

#make directory with htseq-counts
directory <- "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts"
directory <- system.file("extdata", package = "pasilla", mustWork = TRUE)
sampleFiles <- grep("count",list.files(directory), value = TRUE)
sampleCondition <- c("IL4","M0")
sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory,design=~condition)
ddsHTSeq

The following is the error

Error in `.rowNamesDF<-`(x, value = value) : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘pasilla_gene_counts.tsv’ 
> ddsHTSeq
Error: object 'ddsHTSeq' not found

Very very grateful for your insight.

RNA-Seq Galaxy HTSeq • 1.9k views

ADD COMMENT • link 4.7 years ago by mahejabeen.nidhi ▴ 20

0

Entering edit mode

I assume, there is something wrong with sampleTable, e.g. the column condition has only 2 entries, but shouldn't it have 6? What is the output of just sampleTable?

Edit: it looks like you forgot to remove the line

directory <- system.file("extdata", package = "pasilla", mustWork = TRUE)

Because, first you assign directory to "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts" (the htseq counts are there, right?), then you overwrite it.

ADD REPLY • link 4.7 years ago by e.rempel ★ 1.1k

0

Entering edit mode

I made the edits, and still am getting an error

> #make directory with htseq-counts
> directory <- "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts"
> sampleFiles <- grep("count",list.files(directory), value = TRUE)
> sampleCondition <- c("IL4","M0","IL4","M0","IL4","M0")
> sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition)
Error in data.frame(sampleName = sampleFiles, fileName = sampleFiles,  : 
  arguments imply differing number of rows: 0, 6
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory,design=~condition)
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file '/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts/pasilla_gene_counts.tsv': No such file or directory
> ddsHTSeq
Error: object 'ddsHTSeq' not found

ADD REPLY • link 4.7 years ago by mahejabeen.nidhi ▴ 20

0

Entering edit mode

My guess is that in the table you are loading the genes as an extra column, thats why the number of columns are not matching

ADD REPLY • link 4.7 years ago by biofalconch ★ 1.3k

0

Entering edit mode

arguments imply differing number of rows: 0, 6

meaning that the object sampleFiles is likely to be empty (it has 0 rows). What is the output of list.files(directory)?

ADD REPLY • link 4.7 years ago by e.rempel ★ 1.1k