I am performing DE analysis using DESeq package. The single end illumina reads are bowtie aligned. I used the raw counts for DE as required by DESeq. I got the below shown normalized counts from the following code
temp = t(sizeFactors(cds))
sizematrix<-matrix(data=temp, nrow=nrow(countTable), ncol=ncol(temp), byrow=TRUE)
scaledcounts = countTable/sizematrix
This one of the feature I am showing here for an example. (S1_1,S1_2,S1_3 are replicates and similarly S2)
`S1_1 S1_2 S1_3 S2_1 S2_2 S2_3`
`0 0 0 5.87 12.60 21.85
The corresponding result after negative binomial Test from DESeq, I get the below data.
`Base Mean BaseMeanA BaseMeanB FoldChange log2FoldChange pVal pAdj
`6.72205848463734 0 13.4 Inf Inf 0.0004 0.00136
My question is in the given case, I would assume, S2 expresses this gene and S1 does not. In my filtering for significant genes, my biologist is more concerned about the log fold change besides the PValue. So filtering criteria would look something like Pval < 0.0001 & abs(log2Foldchange) > 1.5
.
In this case, I would not be able to deal with the infinite value. Would it make sense to make the count data non-zero (say by simply adding a constant value to all count data (constant, not too large)) and then perform DE analysis.
Another extension question would be, how to evaluate the quality of each sample based on count data... in terms of any sample outliers, etc!
Thanks!
Thank you,... That was really helpful.