I performed differential gene expression analysis using EgdeR on RNAseq data and using the DE i generated volcano plot. The volcano plot does not look like a typical volcano plot. I have made volcano plots before and they were normal plots.
I have searched a lot to find the reason why it could be so but unfortunately could not find a good reason. Could someone tell me why this could be?
I need to understand why there is a separate line of genes detached from the rest of the volcano plot on the right corner.
Analysis setup:
Organism : Human
Control samples : 3
Infected samples : 3
RUVseq code:
exprs1<- K2_T2
filter<-apply(exprs1,1,function(x) length(x[x>5])>=2)
filtered<-exprs1[filter,]
head(filtered)
x1<-as.factor(rep(c("M","T2"), each=3))
colnames(exprs1) <- c("K1", "K2", "K3", "T2_1", "T2_2","T2_3")
set<-newSeqExpressionSet(as.matrix(filtered),phenoData=data.frame(x1, row.names=colnames(filtered)))
head(set)
colors<-brewer.pal(3, "Set1")
plotRLE(set, outline=FALSE, ylim=c(-10,4),col=colors[x1], main = "Relative Log Expression K/T2")
plotPCA(set, col=colors[x1], cex=1.2, main = "Principal Components Plot K/T2")
controls <- rownames(set)
differences <- matrix(data=c(1:3, 4:6), byrow=TRUE, nrow=2)
seqRUVs <- RUVs(set, controls, k=1, differences)
plotRLE(seqRUVs, outline=FALSE, ylim=c(-10,10),col=colors[x1],main = "Relative Log Expression K/T2")
plotPCA(seqRUVs, col=colors[x1], cex=1.2,main = "Principal Components Plot K/T2")
design<-model.matrix(~x1, data=pData(seqRUVs))
EdgeR code:
y <- DGEList(counts=counts(seqRUVs), group=x1)
y <- calcNormFactors(y, method="upperquartile")
y <- estimateGLMCommonDisp(y, design)
y <- estimateGLMTagwiseDisp(y, design)
dispersion.true <- 0.2
fit <- glmFit(y, design,dispersion = dispersion.true)
lrt <- glmLRT(fit, coef=2)
topTags(lrt)
row.names(lrt)=gsub("\\..*","",row.names(lrt)) #to remove decimals after the ENSEMBL IDs
lrt$table$Symbol <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="SYMBOL",keytype="ENSEMBL",multiVals="first")
lrt$table$EntrezID <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="ENTREZID",keytype="ENSEMBL",multiVals="first")
lrt$table$Uniprot <- mapIds(org.Hs.eg.db,keys=row.names(lrt),column="UNIPROT",keytype="ENSEMBL",multiVals="first")
result<-topTags(lrt,n=nrow(lrt), adjust.method="BH")
mart <- useEnsembl(biomart = "ensembl",
dataset = "hsapiens_gene_ensembl",
mirror = "useast")
IDs<-getBM(attributes=c("ensembl_gene_id","entrezgene_id"), "ensembl_gene_id",row.names(result$table) , mart)
head(IDs)
IDs <-as.data.frame(IDs[!duplicated(IDs$ensembl_gene_id),])
rownames(IDs) <- IDs[,1]
IDs$ensembl_gene_id<-NULL
Check<-merge(result$table, IDs, by=0)
head(Check)
result<-Check[order(Check$FDR, decreasing = FALSE), ]
head(result)
Volcano plot:
library(dplyr)
library(ggplot2)
library("ggrepel")
res<-cbind(result$Symbol,result$logFC, result$PValue,result$FDR)
res<-as.data.frame(res)
head(res)
colnames(res) <- c("Gene", "Fold", "pvalue", "FDR")
res$Fold <-as.numeric(as.character(res[,2]))
res$pvalue <-as.numeric(as.character(res[,3]))
res$FDR <-as.numeric(as.character(res[,4]))
head(res)
str(res)
nrow(res)
res<-res[complete.cases(res), ]
results<-res
results = mutate(res, sig=ifelse(res$FDR<0.05, "FDR<0.05", "Not Sig"))
head(results)
p = ggplot(results, aes(Fold, -log10(FDR))) +geom_point(aes(col=sig)) +scale_color_manual(values=c("red", "black"))
p
Are these nominal or corrected p-values ?
These are corrected p-values
IMO that's why you have this "line" of separate genes appears. Could you try your volcano plot with nominal p-values. FDR can still be used for coloring.
Does not change the plot.
Ok then you should add more information on how you generate the plot then (e.g. egedR command, how many samples, experimental desig, etc... ) in order that we can help you.
More details about the code added!
What is your organism? Are you sure you did not apply any filtering on fold-change? Basically you have a set of (I bet lowly expressed, or probably even co-expressed) genes which have a really big fold change. You can filter your data with p-val and and fold-change to see what are these outliers, maybe it will give you some clues. And actually you have (at least) two of these outliers, there is another line closer to the main trunk of the volcano plot.