I am trying to create this fig. according to DESeq. rs is rowsum. But I get a fairly amount of significant genes at low counts as plotted in this figure. What could be the reason?
The two groups of samples are expected to be almost the same. But they were done in different batches, but same facility and same technician.
So you were trying to plot the average expression of the genes in all conditions by their rank? What do you want to achieve by drawing this graph? We usually plot the mean expression count against the log2 fold change and color the significant genes (or gradient of color based on the p-value) e.g Page 7 of here
Thanks! I was trying to see if the filtering (genes with rowsum ranking of <40%) as described by DESeq does anything to my analysis. But it actually yielded less DE genes.
> lFilt<-rep(+Inf, length(res$padj))
> lFilt[use]=Filt$padj #Filt is the the result of filtering res
> tab3=table("nofil"=res$padj<.1, "fil"=lFilt<.1)
fil
nofil FALSE TRUE Sum
FALSE 15839 259 16098
TRUE 1050 2921 3971
Sum 16889 3180 20069
Note that the x axis is rank, so "low expressed" needs to be taken with a significant grain of salt since this could well be a mean expression of 100 or something like that. If there's a batch then one should expect a fair number of DE genes due to that. I've experienced this when I personally used the same extraction kit on different days.
Anyway, that plot is meant more as a diagnostic plot so you can see if the automatic independent filtering is working properly.
Thanks! I was trying to see if the filtering (genes with rowsum ranking of <40%) as described by DESeq does anything to my analysis. But it actually yielded less DE genes.
> lFilt<-rep(+Inf, length(res$padj))
> lFilt[use]=Filt$padj #Filt is the the result of filtering res
> tab3=table("nofil"=res$padj<.1, "fil"=lFilt<.1)
fil
nofil FALSE TRUE Sum
FALSE 15839 259 16098
TRUE 1050 2921 3971
Are you using DESeq rather than DESeq2? The latter will automatically filter for you. There is no reason to use DESeq rather than DESeq2 unless you're absolutely required to do so.
So you were trying to plot the average expression of the genes in all conditions by their rank? What do you want to achieve by drawing this graph? We usually plot the mean expression count against the log2 fold change and color the significant genes (or gradient of color based on the p-value) e.g Page 7 of here
Thanks! I was trying to see if the filtering (genes with rowsum ranking of <40%) as described by DESeq does anything to my analysis. But it actually yielded less DE genes.