Entering edit mode
18 months ago
kalavattam
▴
280
Apparent "wings" in volcano plots (for example, as seen here) are evidence for the relationship between fold change and p-value when expression is low in one condition and there are few replicates. If one wishes to remove or minimize these, that can be done by pre-filtering the counts matrix (for example, excluding rows with partial or complete row sums/means/etc. below some user-defined threshold) and/or performing shrinkage for the effect size estimates.
My main question is this: Is it a problem/mistake to not remove or otherwise minimize "wings"?
Follow-up questions:
- If it is necessary to address these "wings", is it suitable to do shrinkage without counts-matrix filtering or vice versa?
- Any advice on setting thresholds for counts-matrix filtering in a non-arbitrary way?
The stats for these gene are often not reliable. If the difference in mean expression is due to few outliers then it is best to filter them. That saves you from spurious calls. Is filtering a problem?
Thanks, no, filtering is not a problem. Do you have a preferred way to go about it? Maybe something as straightforward as this?
In reading over things, it seems
edgeR::filterByExpr()
could be useful here.Do you recommend doing shrinkage too?
I've found shrinkage mildly difficult to explain to end users and generally forego it, but providing an
lfcThreshold
during testing can be really useful for filtering as well.You could manually filter as you describe or use the edgeR approach, which generally works quite well. If you use DESeq2, it won't provide significance values for genes that it considers too lowly expressed to reasonably test (though it feels pretty conservative in what it considers "expressed enough").
Thanks, yeah, we're relying on the independent filtering performed by DESeq2 but still see "wings" in some of our plots.
If that doesn't remove them, try
Change the thresholds of 10 counts in at least 3 samples as desired.
I like filterByExpr as it is automated and generally works rather well. Shrinkage is mainly for visualization and ranking so it is not critical but nice to have. Since the shrinkage is not part of the testing procedure (unless you use lfcShrink with a lfc threshold) the main filter should be the pvalue anyway.