I have a file of read counts with which I want to find the differentially expressed genes from using DeSeq2. The file was generated using feature counts.
C11 C15 C19 C23 N11 N15 N19 N23
NM_000014 4422 14216 8885 17031 8162 4811 12536 8273
NM_000015 3 0 7 2 0 9 2 6
NM_000016 1063 1192 1608 1345 1118 951 943 1120
NM_000017 164 424 463 507 603 692 494 653
NM_000018 5193 12982 11382 11716 10030 14180 9379 13316
NM_000019 654 1103 1106 1184 743 497 569 844
When I try to create a DeSeq2 object using the DESeqDataSetFromMatrix
I get the following error:
DESeq.ds <- DESeqDataSetFromMatrix(countData = readcounts,colData = sample_info,design = ~ condition)
Error in DESeqDataSet(se, design = design, ignoreRank): some values in assay are not integers
Traceback:
1. DESeqDataSetFromMatrix(countData = readcounts, colData = sample_info, design = ~condition)
2. DESeqDataSet(se, design = design, ignoreRank)
3. stop("some values in assay are not integers")
I checked the entire read counts file, there are no integers in it, so I don't understand why this error keeps occuring. I tried running the sapply(readcounts, class)
command as suggested in this thread (which did not give a clear solution) and get the following output:
C11 'numeric'
C15 'numeric'
C19'numeric'
C23 'numeric'
N11 'numeric'
N15 'numeric'
N19 'numeric'
N23 'numeric'
I tried using DESeqDataSet
instead, but that requires a RangedSummarizedExperiment
object from the function summarizeOverlaps
from the Genomic Alignments package. The summarizeOverlaps
function does the same job as featurecounts - generate read counts. I don't want to repeat that step.
How did you obtain the count matrix? Are these raw (=non-normalized) counts? Please show
head(readcounts)
.You can also try
as.integer(readcounts)
given that there are indeed integers but they are somewhat misclassified as characters of fators. How did you import the count matrix?The file was given to me by my professor as a csv file. Yes these are raw counts. I imported the file using the
read.table
command.as.integer(readcounts)
doesn't work becausereadcounts
is a dataframe.Come on,
as.integer(as.matrix(readcounts))
Ah sorry XP New to R.
Edit: Sorry it worked. Should I convert the integer object back to a data frame? Again, sorry if the question is too obvious.
No problem, sorry did not intend to sound harsh :)
mode(readcounts) <- "integer"
is the last thing I could think of.Have you tried importing just the top of the file? Maybe there is one line that got corrupted. Is dim(readcounts) what you expect? You might have a weird whitespace hiding in there somewhere.
might be worth a try
Hey, thank you SO MUCH for asking me to check the dimensions of my data frame. I was indeed missing around 8000 genes. I figured out the error; I was using this code to make my Gene_IDs unique and convert them to row names:
That block of code was somehow chopping off the last 8k of my genes. I also figured that it was adding some non-integer values. So I replaced the code with this instead:
I found that bit of magic here. I was able to create a DESeqObject now. Thanks swbarnes and ATpoint!