Question

Scatter Plot For Correlations With Heatdensity

7

Entering edit mode

12.2 years ago

k.nirmalraman ★ 1.1k

Hi All,

I am trying to show some correlation between two samples and would like to do a scatter plot for the same.

I tried the following with ggplot2 but I am wondering if its possible to get the heat density as shown here:

qplot(x,y,data=data)+geom_abline(colour = "red", size = 1)+theme_bw() what I got

I would like a scatter plot as shown below.

Correlation PLot

Can you help me acheive this? Thanks!!

visualization visualization r • 22k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 12.2 years ago by k.nirmalraman ★ 1.1k

Ram · Answer 1 · 2014-05-19

12

Entering edit mode

11.3 years ago

Jason ▴ 940

I believe the plots you asked to make were originally made using the R package LSD. I would check it out, it's a very easy package of commands to use that make great looking plots. Also, it can calculate different types of correlations for you too i.e. spearman and pearson.

n <- 10000
x <- rnorm(n)
y <- rnorm(n)
DF <- data.frame(x,y)
library(LSD)
heatscatter(DF[,1],DF[,2])

Edit: for some reason the image I generated isn't appearing (I copy and pasted it), any help from a mod? It looks like the images in this link.

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by Jason ▴ 940

1

Entering edit mode

Dragging and dropping an image (it looks like that's what you did) won't work. You need to post the image elsewhere and then just link to it.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.3 years ago by Devon Ryan 105k

0

Entering edit mode

I have a question, how can you add a label next to a heatscatter (which I am loving as my new fav graph) to show the viewer what the difference in density of points is between two colours? Something along the lines of this:

Red = 10-20 data points overlapping

Blue = 0-10 data points overlapping

Is it even possible, seems like something my PI would like to see, as would I.

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 10.9 years ago by james.lloyd ▴ 100

1

Entering edit mode

I don't know exactly how to do that but maybe the contour function will help. I asked a developer of LSD (Bjoern Schwalb), who actually posts a decent amount here, about what the values along the contour mean (see add.contour = TRUE) and he told me "the values shown are density estimates from a 2D Kernel Density Estimator function that is used internally (KDE2D)". For my presentations and recent manuscript I just made a color bar that goes from blue to red with the other colors in between and just said the red was high density and the blue is low density (most people are generally satisfied with that as long as your mention the sample size (i.e. n = 100). The publications I've seen that have used heat scatter have never specified exact numbers of how many data points overlap.

You may want to look into hexbin if finding the number of data points overlapping is really important. I think it's supposed to do a good job of performing that task: http://www.statmethods.net/graphs/scatterplot.html (it's under high density scatter plots)

This may also help for future LSD work if you hadn't seen it already: http://cran.fhcrc.org/web/packages/LSD/LSD.pdf

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 10.9 years ago by Jason ▴ 940

0

Entering edit mode

This is awesome! Wish I could save the plots as ggplot objects though

ADD REPLY • link 9.3 years ago by sviatoslav.kendall ▴ 990

Ram · Answer 2 · 2013-05-31

7

Entering edit mode

12.2 years ago

Irsan ★ 7.8k

Adapted from stackoverflow

# generare random data, swap this for yours :-)!
n <- 10000
x <- rnorm(n)
y <- rnorm(n)
DF <- data.frame(x,y)

# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)

# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z))
names(gr) <- c("xgr", "ygr", "zgr")

# Fit a model
mod <- loess(zgr~xgr*ygr, data=gr)

# Apply the model to the original data to estimate density at that point
DF$pointdens <- predict(mod, newdata=data.frame(xgr=x, ygr=y))

# Draw plot
library(ggplot2)
ggplot(DF, aes(x=x,y=y, color=pointdens)) + geom_point() + scale_colour_gradientn(colours = rainbow(5)) + theme_bw()

Scatterplot with points coloured according to the amount of points in that area

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 12.2 years ago by Irsan ★ 7.8k

0

Entering edit mode

@ Irsan: This is what I would like to currently generate for my gene expression data of dimension x=4X15000 and y= 4X15000 to show the correlation between all gene pairs in x and y. Could you please suggest how I should modify my data to obtain a scatterplot of gene expression based on heat density.

ADD REPLY • link 11.9 years ago by spaul8505 ▴ 20

0

Entering edit mode

Yes, how does your data look like now?

ADD REPLY • link 11.9 years ago by Irsan ★ 7.8k

0

Entering edit mode

My data is initially two dataframes of dimension 15000 X 4 each where the rows are the genes and the columns are the samples. So for these two dataframes, I would like to find the scatterplot of gene correlation density.

ADD REPLY • link 11.9 years ago by spaul8505 ▴ 20

0

Entering edit mode

Ok, clear. I will come back to you end next week. Leaving for holiday now

ADD REPLY • link 11.9 years ago by Irsan ★ 7.8k

0

Entering edit mode

this is the same question is Scatterplots Showing Correlation Between Gene Pairs right?

ADD REPLY • link 11.9 years ago by Irsan ★ 7.8k