Dear all,
I have an experiment with two different conditions, two treated and two untreated samples from where I performed RNA sequencing. here is the code that I'm using for the expression analysis
countdata <- ( as.matrix(read.table("/RNAseq/counts/DESeq2_STAR/count.table", header=TRUE,row.names=1)))
condition <- factor(c(rep("treated",2), rep("Untreated",2)))
coldata <-data.frame(row.names=colnames(countdata),condition)
library("DESeq2")
everything is ok until I use DESeq2
dds <- DESeqDataSetFromMatrix(countdata, coldata, design=~condition)
I can't find a way to make the integer number, and I don't know if I have to fix something with featureCounts or I can do this in R.
please help,
thank you
here the result
head(countdata)
mapping.star.Treated_1.bam mapping.star.Treated_2.bam mapping.star.Untreated_1.bam mapping.star.Untreated_2.bam ENSG00000223972 7.12 58.49 119.48 27.45 ENSG00000227232 2793.80 2960.74 2410.36 1630.14 ENSG00000243485 30.00 16.32 2.93 20.52 ENSG00000237613 0.00 1.00 0.00 1.83 ENSG00000268020 1.00 0.00 0.00 0.50 ENSG00000240361 0.00 0.00 0.00 0.00
condition
[1] treated treated Untreated Untreated Levels: treated Untreated
head(coldata)
condition mapping.star.Treated_1.bam treated mapping.star.Treated_2.bam treated mapping.star.Untreated_1.bam Untreated mapping.star.Untreated_2.bam Untreated
library("DESeq2")
dds <- DESeqDataSetFromMatrix(countdata, coldata, design=~condition)
**Error in DESeqDataSet(se, design = design, ignoreRank) : some values in assay are not integers**
1) This is not a forum, please remove this tag.
2) Normally, you should have integers going into DESeq2. This is likely a featurecounts "issue". Please share how you generated your counts from featurecounts. Is this gene-level data?
Hi Ishepard, I don't know what tag you mean, I'm sorry if I did something wrong.
Here it is what I used to generate the count file with featureCounts and yes, this is a gene-level data
thanks
Here is your problem:
--fraction
will generate fractional counts. featureCounts is not appropriate for counting multi-mapped reads, you shouldn't use-M --fraction
. If you want to count multi-mapped reads, use Salmon, RSEM or something similar.Ideally if I remove this command -M --fraction I should have integers value, I will loose some information but I can still go further with the analysis, isn't it?
thank you
Yes indeed.
But you won't loose any information, because featureCounts use the naive method of splitting the counts by the number of positions a read mapped. Salmon, kallisto and RSEM, on the other hand, use a expectation-maximization maximum likelihood algorithm, which will apportion multi-mapped reads based on the proportion of uniquely mapped reads mapping to the same feature.
Which is just a long way of stating the algorithm featureCounts use for counting multi-mappers is bad and should be avoided.
Or, use
--fraction
, but round your counts before giving them to DESeq.Thank you to all for the precious advices, I got my first heatmap :)
Now I have to find a way to associate the gene name instead of the gene ID
biomaRt package is a good annotation package as an fyi. You can add gene ID, descriptions, coordinates etc...