Hi, Could you explain me how it's possible to have large FCs and adj.pvalues equal to "NA" ? The counts table didn't show any missing or aberrant values. Thanks !
Hi, Could you explain me how it's possible to have large FCs and adj.pvalues equal to "NA" ? The counts table didn't show any missing or aberrant values. Thanks !
DESeq2 gives a few reasons in their documentation.
Note on p-values set to NA: some values in the results table can be set to NA for one of the following reasons:
- If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.
- If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described below
- If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA. Description and customization of independent filtering is described below
There are several stages at which an NA might be included in the adj.pvalues column in DESeq2 output.
The two most likely are low expression filtering and outlier exclusion.
independentFiltering = TRUE
in the call to results()
. For more details see this part of the DESeq2 user guide.cooksCutoff=FALSE
in the call to results()
. For more details see this part of the DESeq2 user guide. You don't have to use this option, but using it maximises the number of significant genes you will get. This does not mean that none of the genes it filters out will significant, but rather than filtering out those genes means that a larger number of other genes become significant.
The other thing to be aware of is that the default setting for significance in DESeq2 is padj<0.1, and this is the level that independentFiltering
uses as its benchmark. If you are using 0.05
, then you should set alpha=0.05
in your call to results.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks, but in my case not all samples have zero counts, only those from one condition : log2FC adj.pValue Cond1 Cond2 Cond3 Mock1 Mock2 Mock3 6.88709872531625 NA 5.31277170703943 27.6763981516444 31.2711272029594 0 0 0 So I was expecting to get a low adj.pValue since there's a difference between the Mock - Cond counts. Maybe the problem comes from "Cond1" which have a lower count vs "Cond2" et "Cond3"
That only addresses point 1 in their documentation. See @i.sudbery's answer on how points 2 and 3 factor in.