Question

RNA seq

0

Entering edit mode

6 months ago

Sudip • 0

I am doing a Transcriptomics study, where I have 120 samples. I want to know what criteria should I set to filter counts. Below is the r code I am using now. But I think maybe I should set the threshold for the counts to a higher number like 50, because after I perform differential gene expression analysis I get very high log2fold changes for some genes and they have low base mean. Has anyone have any experience in dealing with such big datasets.

keep <- rowSums(counts(dds)) >= 5
dds <- dds[keep,]

Deseq2 • 446 views

ADD COMMENT • link updated 6 months ago by Ram 44k • written 6 months ago by Sudip • 0

0

Entering edit mode

Cross-posted on BioC support: https://support.bioconductor.org/p/9158559/

Please don't do this, you're asking multiple communities to invest effort into solving your problem and most people on either community don't know you've sought help elsewhere as well.

ADD REPLY • link 6 months ago by Ram 44k

Ram · Answer 1 · 2024-05-30

From the DESeq2 vignette

Pre-filtering
"While it is not necessary to pre-filter low count genes before running the DESeq2 functions, there are two reasons which make pre-filtering useful: by removing rows in which there are very few reads, we reduce the memory size of the dds data object, and we increase the speed of count modeling within DESeq2. It can also improve visualizations, as features with no information for differential expression are not plotted in dispersion plots or MA-plots."

DESeq2 analysis will account for the distribution in your data and besides the practical reasons listed above I wouldn't worry too much about prefiltering. After running DESeq() you can apply some filters on the resulting DE result to account for any genes you might not want to pull in for any visualisation or reporting. But in theory, the prefiltering step isn't a strict requirement.

The advice here would be to maybe try a few thresholds, see what the output looks like and see if you notice a big difference.