DESeq2 MA plot label subset of genes
2
1
Entering edit mode
3.7 years ago
kstangline ▴ 80

Hello,

I'm new to R and I'm trying to make a MA plot from my DESeq2 results using ggplot2.

I have figured out how to make a MA plot using the following code:

plot_poly <-
  all_counts.poly.results %>%
  as.data.frame() %>%
  ggplot(aes(log2(baseMean), log2FoldChange) +
  geom_point(aes(color = pvalue < 0.05), cex = 0.1) +
  labs(title = "Poly Torin Treated vs Untreated")

Inside the all_counts.poly.results are EnsGeneIDs I'm interested in labeling on the graph, but there are too many to plot, so want to filter this against an excel file with specific EnsGeneIDs.

For example, I was thinking about setting it up like this with dplyr, but I'm not sure if this is correct.

# contains EnsGeneIDs I want to be plotted
EnsGeneIDs <- read_excel("/Users/kylestangline/Desktop/geneIDs.xls") 

filtered_all_counts.poly.results <- all_counts.poly.results %>%
  filter(all_counts.poly.results$EnsGeneIDs %in% EnsGeneIDs) # filter only specific EnsGeneIDs

Then use these filtered EnsGeneIDs as labels on the MA plot I made above?

RNA-Seq deseq R • 2.8k views
ADD COMMENT
0
Entering edit mode

Have you tried running that and it didn’t work? I can’t tell what exactly you’re asking. Do you want to only plot those genes, or do you want to plot all but only label those ones?

Side note: you can just put the bare column name in the filter call without the df$, although I would recommend changing the name of the EnsGeneIDs object so it differs from the column name. Also, if EnsGeneIDs reads in as a dataframe (rather than a vector) you may need to say %in% EnsGeneIDs$V1 (if that’s the column’s name) or convert it to a vector.

ADD REPLY
0
Entering edit mode

Thanks for the reply! I want to plot all the genes, and I only want to label a few (about 10 genes out of the thousands that ggplot2 plots), hence why I wanted to filter my all_counts.poly.results dataframe with the excel I read in.

ADD REPLY
4
Entering edit mode
3.7 years ago
loughrae ▴ 90

To label specific points, you can add a new column to all.counts.poly.results where you test whether the gene is in the list and if so the value is the gene name (whatever you want the label to be) and if not it’s empty:

[convert poly to df and IDs to vector if needed] `

all.counts.poly.results$mark <- ifelse(all.counts.poly.results$EnsGeneIDs %in% IDs, all.counts.poly.results$EnsGeneIDs, ‘’)
ggplot(all.counts.poly.results, aes(...) + geom_point(...) + geom_text(aes(label = mark))

You could also try putting NA instead of ‘’ in the ifelse().

ADD COMMENT
0
Entering edit mode
3.1 years ago

Just to add a little bit more depth to the answer by __@loughrae__

You can also use case_when from __dplyr__ to make the new variable in a more readable fashion

library(dplyr) 
all.counts.poly.results %>%  
mutate (mark=case_when(
    EnsGeneIDs %in% IDs ~as_character(EnsGeneIDs),
    TRUE ~ NA_character))

And for plotting we can use an application of __ggrepel__ library to produce more readable plots.

 library(ggrepel) 
 gplot(all.counts.poly.results, aes(..., label=mark) + 
   geom_point(...) + 
   geom_label_repel (box.padding = 0.5, max.overlaps = Inf)

You can play a bit with the options of __ggrepel__ until you get what you desire

ADD COMMENT

Login before adding your answer.

Traffic: 2823 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6