Entering edit mode
5.2 years ago
juan.crescente
▴
110
I'm using DESeq2 for a differential expression analysis of sRNA data. The R code looks like this:
library("DESeq2")
alpha <- 0.05
setwd(dirname(rstudioapi::getActiveDocumentContext()$path));
getwd()
data_path <- "/home/juan/Desktop/juan/bio/mirna_mrcv/data/counts.valid.csv"
result_path <- "/home/juan/Desktop/juan/bio/mirna_mrcv/data/"
# Load data
countdata <- read.table(data_path,header=TRUE,sep="\t")
result_file = paste(result_path,"mirna.deg.csv",sep="");
#DROP unique miRNA clusters
# I don't know why I have to do this first, otherwise RStudio hangs
countdata <- countdata[!is.na(countdata$Name),]
row.names(countdata) <- countdata$Name
countdata <- subset(countdata, select = -c(main))
countdata <- subset(countdata, select = -c(Name))
countdata <- subset(countdata, select = -c(Locus))
countdata <- as.matrix(countdata)
head(countdata)
# Assign condition (first four are controls, second four contain the expansion)
(condition <- factor(c("control","control","treatment","treatment")))
# Analysis with DESeq2 ----------------------------------------------------
# Create a coldata frame and instantiate the DESeqDataSet. See ?DESeqDataSetFromMatrix
(coldata <- data.frame(row.names=colnames(countdata), condition))
levels(coldata$condition)
coldata
dds_m <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
# Run the DESeq pipeline
dds <- DESeq(dds_m)
res <- results(dds, alpha=alpha)
res <- res[!is.na(res$padj), ]
res <- res[res$padj <= alpha, ]
table(res$padj <= alpha)
## Order by adjusted p-value
res <- res[order(res$padj), ]
## Merge with normalized count data
resdata <- merge(as.data.frame(res), as.data.frame(counts(dds, normalized=TRUE)), by="row.names", sort=FALSE)
resdata
names(resdata)[1] <- "Name"
head(resdata)
## Write results
write.csv(resdata, file=result_file, row.names=FALSE)
result_file
For this, I'm filtering using padj < 0.05, which gives me 11 results.
When I create the volcano plot using EnhancedVolcano, I use padj in y axis and filter by 0.05 Which is supposed to filter with pCutOff ("In this case, the cutoff for the P value then relates to the adjusted P value")
library(EnhancedVolcano)
dds <- DESeq(dds_m, betaPrior=FALSE)
res1 <- results(dds)
res1
EnhancedVolcano(res1,
lab = rownames(res1),
x = 'log2FoldChange',
y = 'padj',
pCutoff = 0.05)
But I'm not undestanding where my 11 elements should be, I guess from the horizontal line but how is DESeq2 filtering by log 2 fold change, how is it selecting where to cut?
Hey, I developed EnhancedVolcano. Yes, the other genes likely have absolute log [base 2] fold changes < 1. You can check in your
res1
objecthey thanks for answering! Should I select all values below padj<0.05 no matter the log2foldchange? Because DESeq2 is selecting all below that and it's inconsisten with the volcano plot, which seems to suggest both variables
You could set the following:
...however, I am not sure how that would look.
Can you point me what is the correct way to do the analysis with DESeq2? Should I use a FC cutoff prior to doing the volcano plot in DESeq2?
Typically, we use FDR Q<0.05 (adjusted P<0.05) __AND__ absolute log2FC>2 as cutoffs in a differential expression study. Volcano plots are then usually generated using the nominal (un-adjusted) p-values. In your experiment, however, you have nothing that has an absolute log2FC>2 ... ...
In your plot, also, if you plot the adjusted p-values, then you need to change the value of
ylab
to:Also, in your DESeq2 code, if you use
betaPrior = FALSE
, then you should also make use of thelfcshrink()
function; alternatively, just use:For example, take a look at my 2 previous answers:
very helpful! thank you