I have gene expression data from which I selected 16 genes stored in the df4 variable and tried to make an enhanced volcano:
library(EnhancedVolcano)
# Specify target genes for labeling
target_genes <- c(
"ACTN2", "CRYAB", "BMP10",
"CSRP3","DES","FHOD3",
"FLNC","LDB3","MYZAP",
"MYPN","MYOZ2","NEXN",
"PDLIM3","PDLIM5",
"TCAP","TTN"
)
# Subset data frame to include only target genes
df4 <-subset(df3, gene %in% target_genes)
# Create EnhancedVolcano plot with labels for the subset of target genes
# Assuming your data frame has a column named 'gene' for gene names
p8 <- EnhancedVolcano(
df4,
lab = rownames(df4), # Use the 'gene' column for labels
x = 'logFC',
y = 'P.Value',
title = 'Dilated Cardiomyopathy vs Control',
pCutoff = 0.05,
FCcutoff = 0,
pointSize = 4.0,
labSize = 3.0
)
p8
ggsave(file="enhancedvolcano.jpeg",plot = p8)
But the problem is that it does not label the green dots where one condition is true:
How can I solve this problem? Do I need to call some other arguments in EnhancedVolcano function?
Also, rownames(df4)
gives following gene labels:
'PDLIM3''FHOD3''TTN''FLNC''LDB3''MYOZ2''ACTN2''CSRP3''CRYAB''PDLIM5'
Yes, I also tried to draw all genes and highlight genes of interest plot with labels like this :
but it gives me the following error:
Did you even read the manual section I linked to? I've highlighted the relevant parts.
yes, I tried this :
and its giving me a plot:
Still, it's showing 5 genes only. How can I show all 10 available genes?
They could be right on top of each other. Try the
boxedLabels
anddrawConnectors
options. Also, it looks weird that your volcano plot is all green and red. Don't pick meaningless thresholds for logFC/p-value, use sensible thresholds so most of the dots are grey.what is a sensible threshold in my case?
Maybe use
1
?for upregulated genes, we normally take logfc>0 and pvalue<0.05 and for downregulated genes, we take logfc<0 and pvalue<0.05. Is this a standard thershold
Ram already answered your question:
It is important to understand the meaning of LogFC, and it doesn't seem like you do. LogFC of 0 means fold-change of 1, which is identical expression (or no fold-change between the conditions). To pick a LogFC > 0 literally means that any fold-change greater than 1 will be picked for coloring, and that is going to be pretty much all the points in your plot. Even a fold-change of 1.00001 between the two conditions will be colored, which makes no sense.
What Ram suggested, a LogFC > 1, means to color only genes where the difference in expression is at least 2-fold, which makes more sense and in general is an accepted threshold. In practical terms as it relates to your plot, that means only points to the right of +1 on the X-axis will be colored, and only points to the left of -1 on the X-axis will be colored. That should clear the picture so that hopefully your genes of interest are visible.
I change fccutoff to 1 and it gives the following plot:
But no red dots and result looking meaningless
There are plenty of red dots. You're only supposed to have minimal genes that are observed at that threshold.
On the contrary, your earlier plot was meaningless. This one looks like every other Volcano plot out there.
NS simply means that both logfc and p-value conditions are not satisfied. Blue dots show that only the value condition is satisfied and 2 genes are true. So how will I interpret it? Which genes are statically significant? Just 2 genes?
The blue dots are still one level below the red ones in significance. You should be looking at creating genes of interest from the DE genes, not the other way around. This plot shows me that a couple of your genes of interest are minimally DE to a good level of certainty (wouldn't even count as DE for the most part but we're scraping the bottom of the barrel) and none of them are significantly differentially expressed to a good level of certainty. That is, there are a few genes that have
(pval < 0.05 && 0 < abs(logFC) <1)
but none where(pval < 0.05 AND abs(logFC) > 1)
. I'd also look at thepadj
- there are probably no genes among your genes of interest that are actually blue or red.so can we say these 2 genes PDLIM3 and FHOD3 are statistically significant?
Sure but significant for what? They logFC is too low for them to matter. Please read Mensur Dlakic 's excellent summary of why none of your genes are actually meaningful
so if none of my genes are meaningful then what should I do? Should I change logfc and pvalue? How can I include it in my study?
You are asking if you should change the meaning of the word "meaningful" if there is nothing meaningful in your current question. Please think about what you're saying.
You seem to have made up your mind about these genes being important to you regardless of their significance as observed in your experiment. No one can help you there.
Like I said earlier, "You should be looking at creating genes of interest from the DE genes, not the other way around"
But my target genes are
First I find it in 6 geo samples and then did analysis in 2 conditions
I did the test and find :
The data that I used was 3882 differentially expressed genes
There are 3882 DE genes? DE genes should have both logFC>1 AND pval < 0.05.
What means are you comparing?
Basically from 3382 genes, I selected 10 target genes between 2 conditions(Dilated cardiomyopathy vs control) and ran the t-test based on logfc column and it gives the following results :
Please consult a statistician - I cannot help you any more. Here's the last piece of direction I have for you - one your genes is TTN, which is the longest protein in the human proteome and not accounting for that WILL influence your observations.
every observation is septate so how can TTN effects my results?
Thanks. Can I take logfc as 0.5
None of the genes you seem to be interested in would have significantly changed expression even if you take
LogFC=0.5
, as all of their absolute LogFC values are < 0.5. In case you don't know what that means,LogFC=0.5
means 1.41-fold change in expression (2^0.5
). As to whether you could take that cutoff: some people will accept 1.41 DE as significantly changed if it is also significant according to p-values, and others will not. Many people, myself included, like to see at least 2-fold change in expression. In your case it doesn't matter much because only two genes satisfy the p-value cutoff, and their absolute LogFC values are small.You have received plenty of feedback here, and your eyes should be telling you something that your brain possibly refuses to accept. There is no point in asking the same question but in different ways, which is pretty much what you have been doing for the past 2-3 days. I suggest you read some DE papers and tutorials on the internet and hopefully it will become clear why in your results there is not much that will inspire confidence in most people.