Try this in basic plotting: (note: I created two dataframes with 100 genes and each dataframe shares 10 common genes with identical FDR and logFC). Common genes are colored in red . and labelled, rest in green and light green.
output:

input:
set.seed(100)
edge = data.frame(
gene = paste0("gene",sample(100)),
logFC = c(rnorm(80,0,1),rnorm(20,0,12)),
logFDR = rnorm(100,mean=0.05, sd=0.02)
)
dsq =data.frame(
gene = paste0("gene",sample(100)),
logFC = c(rnorm(70,0,1),rnorm(30,0,12)),
logFDR = rnorm(100,mean=0.05, sd=0.01)
)
set.seed(200)
cf=edge[abs(edge$logFC)>2,][sample(nrow(edge[abs(edge$logFC)>2,]),10),]
dsq[dsq$gene %in% cf$gene,]=cf
edge.sorted=edge[with(edge,order(gene)),]
dsq.sorted=dsq[with(dsq,order(gene)),]
plot(
x = edge.sorted$logFC,
y = edge.sorted$logFDR,
col = "darkgreen",
pch = 16,
cex=2,
xlab="Log(2) Fold Change",
ylab="FDR",
abline(v=c(-2,2),h=c(0,0.05), col="red", lty=3,lwd=3)
)
points(
x = dsq.sorted$logFC,
y = dsq.sorted$logFDR,
col = "green",
pch = 16,
cex=2
)
points(
x = edge.sorted$logFC[edge.sorted$logFC == dsq.sorted$logFC],
y = edge.sorted$logFDR[edge.sorted$logFDR == dsq.sorted$logFDR],
col = "red",
pch = 16,
cex=2
)
text(
x = edge.sorted$logFC[edge.sorted$logFC == dsq.sorted$logFC],
y = edge.sorted$logFDR[edge.sorted$logFDR == dsq.sorted$logFDR],
edge.sorted$gene[edge.sorted$logFC == dsq.sorted$logFC] ,
cex = 2,
pos=1,
col = "red"
)
in ggplot same (recycled code from @russh for merged dataframe creation, dataframe is code from above post):
output:
input:
library(dplyr)
library(ggplot2)
dfm = merge(edge.sorted, dsq.sorted, by = "gene")
dfm1 = dfm[dfm$logFC.x == dfm$logFC.y & dfm$logFDR.x == dfm$logFDR.y, ]
dfm1
head(dfm)
head(dfm1)
ggplot(dfm) +
geom_point(
data = dfm,
aes(x = logFC.x, y = logFDR.x),
color = "green",
cex = 3
) +
geom_point(
data = dfm,
aes(x = logFC.y, y = logFDR.y),
color = "lightgreen",
cex = 3
) +
geom_point(
data = dfm1,
aes(x = logFC.x, y = logFDR.x),
color = "blue",
cex = 3
) +
geom_text(
data = dfm1,
aes(x = logFC.x, y = logFDR.x, label = gene),
hjust = 1,
vjust = 2
) +
theme_bw() +
xlab("Log(2) fold change") +
ylab("FDR") +
geom_vline(
xintercept = 2,
col = "red",
linetype = "dotted",
size = 1
) +
geom_vline(
xintercept = -2,
col = "red",
linetype = "dotted",
size = 1
) +
geom_hline(
yintercept = 0.05,
col = "red",
linetype = "dotted",
size = 1
)
Please provide example data.
Dear zx8754. Hi and thank you for your help.
I did not get the point clearly about "example data". If you mean counts.matrix structure, it could be any counts, but the head of my "data" in the code above is as below"
For a given gene, how will you illustrate the connection between its result in DESeq and it's (dependent) result in edgeR?
A gene A in edgeR and DESeq will get twice a logFC and a p-value, so you cannot plot these at the same time using a volcano plot.
yes you can, you just plot two different points for the same gene. Whether it's meaningful to do so is up for discussion
Dear russhh and Wouter, Hi. Here is the problem
I have done DEG analysis by both edgeR and DESeq2 and I checked the overlaps of both package in my both conditions (condition1 and condition2) using a venn diagram and I tried to annotate the common/overlap transcripts reported in both exact test and GLM method. I also checked the results using SARTools. So, now I have two volcano for edgeR and DESeq2 that have about 30% overlap of DEGs in condition1 and 50% overlap in condition2.
I was wondering if I can show all these in a single volcano plot using 3 colours.
sorry, is it two different contrasts each of which has been tested using both edgeR and DESeq2?
I have used the same data and same expressin counts resulted from Trinity for condition1 and condition2 for both edgeR and DESeq2 and focused on the overlaps base on this idea if multiple programs give you the same results, then you can be confident that those results do not depend on the particular assumptions that are made by the programs/statistic tests.
I think the two methods are a bit too similar for that to be appropriate; I'd agree if you were comparing voom and DESeq2. I'd agree with you if you'd subsampled either the counts or the exons and ran the two methods on the two partitions. Not sure if the overlayed volcano would be of value to help choose between the methods - I'd rather see two MA plots stacked next to each other
Good idea! I will check and compare MA plots, too. Aren't MA plots similar to Volcano plots, in a whole perspective?
Not really, it will show how your diffexes vary across different levels of average expression for the two different methods
Okay you can, but no, I don't consider it meaningful :)
When plotting, you need to think what the message is that you want to give. In this case, that's quite unclear.
If you are asking me,
by this approach you (I) can first change the way of visualising the results (converting two different volcano and a venn diagram to only one volcano)
and also make it clear for the readers that the core DEGs you have chosen in the most conservative approach are significantly differentially expressed in both two statistical packages.