As my lab does not a lot of computational power, I had used Galaxy for alignment and HTSeq. To produce better graphs for downstream analysis, I had to switch to RStudio.
I am using the six samples from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132732 There are two treatment conditions IL4 and M0.
I also got the DESeq2 results using galaxy, but I thought that doing the DESeq2 from the HTSeq would have consistent formatting required later on.
The HTSeq in their tabular datatype has a header, but when I convert it to .csv file, it no longer has a header. So when I downloaded the csv file, I added the heading manually.
Below is the sorry excuse of a code I attempted for DESeq2. I think, rather I know, that the sampleCondition is where I went very wrong, but I don't know how to correct it.
#make directory with htseq-counts
directory <- "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts"
directory <- system.file("extdata", package = "pasilla", mustWork = TRUE)
sampleFiles <- grep("count",list.files(directory), value = TRUE)
sampleCondition <- c("IL4","M0")
sampleTable <- data.frame(sampleName = sampleFiles, fileName = sampleFiles, condition = sampleCondition)
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory,design=~condition)
ddsHTSeq
The following is the error
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘pasilla_gene_counts.tsv’
> ddsHTSeq
Error: object 'ddsHTSeq' not found
Very very grateful for your insight.
I assume, there is something wrong with sampleTable, e.g. the column condition has only 2 entries, but shouldn't it have 6? What is the output of just sampleTable?
Edit: it looks like you forgot to remove the line
Because, first you assign directory to "/Users/mahejabeennidhi/Documents/DrKwan/rnaseq6samples/htseqcounts" (the htseq counts are there, right?), then you overwrite it.
I made the edits, and still am getting an error
My guess is that in the table you are loading the genes as an extra column, thats why the number of columns are not matching
meaning that the object sampleFiles is likely to be empty (it has 0 rows). What is the output of
list.files(directory)
?