Hello all,
I recently encountered a problem where I have an unusual number of downregulated genes compared to the upregulated genes. I wasn't sure if this is a problem or if this can happen. I used DESeq2 for the DE analysis. The data was normalized based on housekeeping genes. Do I have to change the design parameter in the DESeqDataSetFromMatrix() function? For the current iteration, the design model was ~condition
. Will I have to include the pooling information in the design (~condition+pool
)? Thanks in advance!
Experimental design:
- Pool A - two controls and three experimental groups
- Pool B - three controls and three experimental groups
6 samples (biological replicates)
- Pool A - positive
- Pool A - positive
- Pool A - negative
- Pool A - negative
- Pool A - negative
6 samples (biological replicates)
- Pool B - positive
- Pool B - positive
- Pool B - positive
- Pool B - negative
- Pool B - negative
- Pool B - negative
DE comparison - positive vs negative (pools A and B together)
You can try to prefilter this a bit. That cloud bottomleft has low baseMeans and large fold changes so this is probably genes with many zeros that are rather unreliable. An automated way would be the
edgeR
functionfilterByExpr()
. Including pool is only necessary if the pools are driving any separation. Check the PCA for it. Is this normal RNA-seq or what?Thank you for the recommendation! This was a low input bulk RNA-seq
Ok, so "standard" RNA-seq. Yeah, I would really try to prefilter a bit and also inspect the PCA. See DESeq2 manual, there is code for PCA in it.
I did do some pre-filtering before I ran DESeq. Will it be because of low read counts in some of the samples? I did some QCs today and I've attached the plots here
I cannot say that these plots are very informative. Run a PCA.
strongly agree. this is a data cleaning issue.