How do I subset gene loci with no difference from DeSeq2 output?
2
0
Entering edit mode
6.8 years ago
MAPK ★ 2.1k

I have a DeSeq2 logfold change output file for sample 1 vs sample 2 I am comparing with the following columns below:

"Loci"  "baseMean"  "log2FoldChange"    "lfcSE" "stat"  "pvalue"    "padj"

I understand that I can extract differentially expressed loci from the table above using padj threshold of <0.1 significance. Can someone please tell me how I can separate **upregulated genes**, **downregulated genes** and gene sets with **no difference (i.e conserved loci)**. What are the cutoff values I should be considering (specifically for padj) if I need to extract each of these gene subsets (i.e upregulated genes, downregulated genes and gene sets with no difference (i.e conserved loci)? Also, I have lots of loci with padj with NA's and I want to know what NA's mean in this case.

deseq2 • 1.8k views
ADD COMMENT
1
Entering edit mode

Just adding a tip for your no-difference question. In DESeq2 you can actually test for no differential expression: see the section "Tests of log2 fold change above or below a threshold" in the vignette.

ADD REPLY
2
Entering edit mode
6.8 years ago

If the number in "log2FoldChange" is negative, that's down regulated.

The NAs are usually genes with so few counts the software can't draw any conclusions about their expression. You'll have to ignore those.

ADD COMMENT
0
Entering edit mode

Thanks. But how about those with no change? How do you subset those that are statistically conserved?

ADD REPLY
1
Entering edit mode

There is no standard cut-off. If you set the cut-off at log (base 2) fold-change <= -2 for down-regulation (couple with some cut-off for FDR adjusted P value), then you're implying that anything between -2 and +2 is neither up- nor down-regulated.

Z-scores may be an additional way to gauge genes that are unchanged. For example, if a gene has a Z-score <1, it means that it's expression is less than 1 standard deviation difference across all samples.

ADD REPLY
2
Entering edit mode
6.8 years ago
igor 13k

If you are using padj<0.1 as significant, then the rest are not significant. Of course, not significant could be both not altered and without sufficient information to make the call.

Regarding NAs, that is actually described in the vignette:

If within a row, all samples have zero counts, the baseMean column will be zero, and the log2 fold change estimates, p value and adjusted p value will all be set to NA.

If a row contains a sample with an extreme count outlier then the p value and adjusted p value will be set to NA. These outlier counts are detected by Cook’s distance. Customization of this outlier filtering and description of functionality for replacement of outlier counts and refitting is described below

If a row is filtered by automatic independent filtering, for having a low mean normalized count, then only the adjusted p value will be set to NA.

ADD COMMENT

Login before adding your answer.

Traffic: 2087 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6