Dear Biostars, Hi
I have a volcano plot (obtained from edgeR). I also have some selected annotated genes that I like to highlight them by showing only their name on that plot.
I have used the valuable script/code from Biostars (thank you @WouterDeCoster and @venu and others).
As most of the lines of the first column in my counts.matrix is empty (I have only about 15 names), I received some errors. For example, in the "Head" example below, I have only PAC1 and Gh as Gene name and the other lines have "space" instead of any Gene name!
would you please check my script and my counts.matrix example and help me in this regard? Thanks
NOTE:
1- Head of my counts.matrix file:
Gene external_gene_name logFC logCPM P.Value FDR
PAC1 TRINITY_c0_g2_i4 -13.8732718193368 4.61855606743036 3.03811154383156e-30 1.87897780840196e-24
TRINITY_c5_g3_i4 13.2697070220115 4.01618042373378 1.21936893870205e-27 3.7707094407506e-22
GH TRINITY_c9_g2_i2 11.2269211011449 3.82178000481344 3.20153046019011e-26 6.60015780727773e-21
TRINITY_c0_g1_i1 12.855465908739 3.60335598332695 2.98018910927419e-24 4.60788644555924e-19
TRINITY_c0_g1_i1 -11.8727923903381 2.62735890209779 1.81499535997098e-20 2.24503673057178e-15
TRINITY_c6_g1_i6 10.3243429332541 2.89490936855587 2.16106621247948e-19 2.22758743227662e-14
TRINITY_g3_i2 -12.0236467730875 2.77706530918009 2.63166646685828e-19 2.32514875441625e-14
TRINITY_c3_g2_i6 -11.2400737828536 2.00142608642608 1.40238473598696e-18 1.0841643566014e-13
TRINITY_c1_g1_i1 -9.45052158373113 2.158090406858 1.66695714263313e-18 1.14551257449686e-13
2- my code:
# Load packages
library(dplyr)
library(ggplot2)
library(ggrepel)
# Read data
data <- read.table("trans.counts.matrix.J_vs_M.edgeR.DE_results", header=TRUE)
data$threshold = as.factor(data$FDR < 0.001)
data = mutate(data, sig=ifelse(data$FDR<0.001, "FDR<0.001", "Not Sig"))
p = ggplot(data, aes(logFC, -log10(P.Value))) +
geom_point(aes(col=sig)) +
scale_color_manual(values=c("red", "black"))
p
p+geom_text_repel(data=filter(data, FDR<0.001), aes(label=Gene))
Add another filter to filter out genes (column) without name some thing like df$gene!="" in addition to p-value cutoff. This would show only those genes with names and with FDR below the cutoff.