I'm conducting a differential expression analysis using DESeq2. Before running the analysis I removed the lowely expressed genes.
In the DESeq2 results I got around 1000 low counts genes. So, I went back and cleaned the raw data even move.
I run DESeq2 again, now the low counts in the results are aroud 600, but the DEGs number also went down.
I don't know if there is an answer to my question but I'll ask anyone, how much low counts is acceptable? I don't want to lose DEGs, but I also don't want noise (low counts) in my data. So what do you reccoment doing? should I go back to the 1000 low counts ? should I eleminate even more low counts genes even though it's causing DEGs number to drop ?
The appropriate play, in my opinion, for DESeq2 specifically, is to run the QC on everything, THEN remove outliers/low counts/bad quality, THEN run association testing.
Why?
You don't have to read any farther than paragraphs 2 through 4 of the background of the DESeq2 publication itself. 4 addresses low counts specifically. Think about what Mike Love is saying there, then control+F for size factors, and think what he is doing and why.
While the implementation of something like DESeq2 is very technical and difficult, the conceptual level is not - and will fully answer your question.
What do you mean run QC ?
I thought quality control was cleaning the data from uninformative genes.. any other steps I need to do ? could you please explain I'm new at this. Thanks!
What do you mean run QC ? I thought quality control was cleaning the data from uninformative genes.. any other steps I need to do ? could you please explain I'm new at this. Thanks!