scatter plot in ggplot2 two colour for two different condition
1
1
Entering edit mode
6.8 years ago
1769mkc ★ 1.2k

Im using two condition wild type and treated I get a scatter but i want to have two different colors Im not sure how to do it , I tried to group them by melting i got aesthetics error so i removed the gene column only i have two column but the data points are labelled as same

df <- HL_60
#gene = df[,1]
WT = df[,2:3]
ATRA = df[,4:5]
wt.mean = apply(WT,1,mean)
atra.mean = apply(ATRA,1,mean)
WT_ATRA = cbind.data.frame(wt.mean,atra.mean)
head(WT_ATRA)



library(ggplot2)
library(ggpmisc)
D1 <-WT_ATRA
head(D1)
my.formula  <- y ~ x

p <- ggplot(data = WT_ATRA, aes(x= wt.mean,y = atra.mean) )+ geom_point(color = 'red',size = .9)+ 
  geom_smooth(method = "lm", se=FALSE, color="black", formula = my.formula) +
  stat_poly_eq(formula = my.formula,
               eq.with.lhs = "italic(hat(y))~`=`~",
               aes(label = paste(..eq.label.., ..rr.label.., sep = "*plain(\",\")~")), 
               parse = TRUE)

The figure i get is this one

Any suggestion or help how to put two colour for two different condition would be highly appreciated

R • 11k views
ADD COMMENT
0
Entering edit mode

I would add a column called condition (with values WT or treated) and melt the dataframe, then colour by the condition should give you the two conditions in two separate colours.

Please don't follow this as it is not possible with this plot, see the discussion below. Apologies for recommending this before looking at the data.

ADD REPLY
0
Entering edit mode

well I would be glad if you could show me in my code , which i posted because I did earlier when i melt i get I get one coloumn for gene ,one column variable and the last one value..but I still couldn;t figure it out

ADD REPLY
0
Entering edit mode

am I not doing the same thing when Im binding the column one is for my wild type and the other one is treated...

ADD REPLY
1
Entering edit mode

Can you post some example data that resembles HL_60?

ADD REPLY
0
Entering edit mode
okay so my data set is as such small set of it 


gene         WT1       WT2           AT1            AT2             VD1             VD2
    ENSG00000227232.5   5.2822087357    6.4447483588    6.8860571504    6.9411803286    5.3968150313    6.4528522014
    ENSG00000278267.1   3.5858305786    3.6836795858    3.5523112   3.5474228185    3.7568282659    2.9090525017
    ENSG00000238009.4   2.22313652  2.3074139286    2.6703264597    1.6500091151    2.1942827694    0.7234491107
    ENSG00000233750.3   1.1525240028    1.7527357273    2.6703264597    3.2560180286    2.6701449288    1.8497987198
    ENSG00000269981.1   0   0   2.8343603766    3.2560180286    1.4793911805    0
    ENSG00000241860.4   2.5587816592    2.8718278554    4.9846498052    5.5538107754    3.5521141942    3.8751528623
    ENSG00000241599.1   0   1.7527357273    2.6703264597    3.5474228185    1.4793911805    0


have a look and would be glad to get your suggestion

My first column is my gene rest are my sample first one is my control wild type rest are treated so im making pairwise comparison

ADD REPLY
2
Entering edit mode

So, what should colour represent? I honestly don't get it.

ADD REPLY
0
Entering edit mode

well to me what i need is the R^2 value but to my boss he needs that the sample labelled to show that it shows that the data points are coming from two different condition I would be glad it you can solve my woe

ADD REPLY
1
Entering edit mode

Hi krushnach80. I believe that your supervisor does not understand the plot.

What you are plotting cannot be colour-coded based on 2 different conditions because the values in your plot are summarising a difference between both conditions. This just a simple scatter plot comparing the mean in disease versus WT.

You could colour the dots in a gradient fashion based on the intensity of the mean.

ADD REPLY
0
Entering edit mode

yeah I m just plotting the replicates just to show that when there is induction with ATRA and VD3 the R^2 decreases to prove that yes after induction there is a change in the expression ,but do you have any other way to show the difference using scatter plot

ADD REPLY
0
Entering edit mode

what i understad is he want those coming from WILD type should have one colour and the one coming from ATRA treatment a different color because thats how he got scatter plot when he did it using seqmonk tool ,I m trying to do the same

ADD REPLY
1
Entering edit mode

...but, if you wanted to do that, then each dot would have 2 colours because each dot represents both WT (value on x-axis) and ATRA (value on y-axis).

You probably mean something like this: https://www.bioinformatics.babraham.ac.uk/projects/seqmonk/Help/3%20Visualisation/3.2%20Figures%20and%20Graphs/scatter_plot_sublists.png

In that plot, only certain dots are shaded, presumably those that have passed some other statistical test like FDR-corrected P values or fold-change, somewhat akin to a volcano plot.

ADD REPLY
0
Entering edit mode

yeah exactly like that seqmonk plot and for this yes " then each dot would have 2 colours because each dot represents both WT (value on x-axis) and ATRA (value on y-axis)." because a I have same number of gene for both the condition WT and ATRA...I played around all kind of combination to put two different colour for WT and ATRA for same gene as they have different values

And I have given my small data sets

ADD REPLY
2
Entering edit mode

To give you an example, this code will colour red (firebrick1) any genes that have linear fold change >= 4.0 in ATRA, and colour green (forestgreen) any genes with linear fold-change <= -4.0

Not sure if this helps (or even works). You can play around with the cut-offs in order to choose how you want to shade the points based on different statistical cut-off thresholds.

WT_ATRA$significance <- "NS"
WT_ATRA$significance[((WT_ATRA$atra.mean / WT_ATRA$wt.mean) >= 4.0)] <- "Up"
WT_ATRA$significance[((WT_ATRA$atra.mean / WT_ATRA$wt.mean) <= -4.0 )] <- "Down"
WT_ATRA$significance <- factor(WT_ATRA$significance, levels=c("NS","Up","Down"))

ggplot(data = WT_ATRA, aes(x=wt.mean, y=atra.mean) ) + 

        geom_point(aes(color=significance), alpha=1/2, size=0.8) +

        scale_color_manual(values=c("grey", "firebrick1", "forestgreen")) +

                ....
ADD REPLY
0
Entering edit mode

I will try your code and it will work definitely , but I have to explain my boss about the plot...It seems he doesn't understand

ADD REPLY
1
Entering edit mode

Okay, I will be your new boss

ADD REPLY
0
Entering edit mode

glad i could have that opportunity may be in future

ADD REPLY
2
Entering edit mode

Just hook your boss up on biostars ;)

ADD REPLY
4
Entering edit mode
6.8 years ago

The plot is in a 2D plane which means each point has two values associated with it (x,y) in this plane. in your case, each point is a gene and has (wt_mean,atra_mean) as coordinate. So, I don't get the point of coloring on the basis of wt_mean and atra_mean. What are you trying to infer?

The plot you mentioned in the comment using fold change as a factor of color code which can be achieved with the following code in ggplot2.

  geom_point(aes(colour = cut(log2fc, c(-5, -1, 1, 5))),size = 2)+
  scale_color_manual(name = "Fold Change",
                     values = c("(1, 5]" = "#FF6666",
                                "(-1,1]" = "#F5F5F5",
                                "(-5,-1]" = "#60B9FF"),
                     labels = c("<-2", "-2< & <2", ">2"))

log2fc is the log2 converted foldchnage information which is divied into three different bins (-5,-1), (-1,1),(1,5). Change them according to your rquirements.

ADD COMMENT
0
Entering edit mode

okay I can directly put the FC into it instead of again doing over the samples

ADD REPLY

Login before adding your answer.

Traffic: 1540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6