I believe the plots you asked to make were originally made using the R package LSD. I would check it out, it's a very easy package of commands to use that make great looking plots. Also, it can calculate different types of correlations for you too i.e. spearman and pearson.
n <- 10000
x <- rnorm(n)
y <- rnorm(n)
DF <- data.frame(x,y)
library(LSD)
heatscatter(DF[,1],DF[,2])
Edit: for some reason the image I generated isn't appearing (I copy and pasted it), any help from a mod? It looks like the images in this link.
ADD COMMENT
• link
updated 5.0 years ago by
Ram
44k
•
written 10.7 years ago by
Jason
▴
940
1
Entering edit mode
Dragging and dropping an image (it looks like that's what you did) won't work. You need to post the image elsewhere and then just link to it.
I have a question, how can you add a label next to a heatscatter (which I am loving as my new fav graph) to show the viewer what the difference in density of points is between two colours? Something along the lines of this:
Red = 10-20 data points overlapping
Blue = 0-10 data points overlapping
Is it even possible, seems like something my PI would like to see, as would I.
I don't know exactly how to do that but maybe the contour function will help. I asked a developer of LSD (Bjoern Schwalb), who actually posts a decent amount here, about what the values along the contour mean (see add.contour = TRUE) and he told me "the values shown are density estimates from a 2D Kernel Density Estimator function that is used internally (KDE2D)". For my presentations and recent manuscript I just made a color bar that goes from blue to red with the other colors in between and just said the red was high density and the blue is low density (most people are generally satisfied with that as long as your mention the sample size (i.e. n = 100). The publications I've seen that have used heat scatter have never specified exact numbers of how many data points overlap.
You may want to look into hexbin if finding the number of data points overlapping is really important. I think it's supposed to do a good job of performing that task: http://www.statmethods.net/graphs/scatterplot.html (it's under high density scatter plots)
# generare random data, swap this for yours :-)!
n <- 10000
x <- rnorm(n)
y <- rnorm(n)
DF <- data.frame(x,y)
# Calculate 2d density over a grid
library(MASS)
dens <- kde2d(x,y)
# create a new data frame of that 2d density grid
# (needs checking that I haven't stuffed up the order here of z?)
gr <- data.frame(with(dens, expand.grid(x,y)), as.vector(dens$z))
names(gr) <- c("xgr", "ygr", "zgr")
# Fit a model
mod <- loess(zgr~xgr*ygr, data=gr)
# Apply the model to the original data to estimate density at that point
DF$pointdens <- predict(mod, newdata=data.frame(xgr=x, ygr=y))
# Draw plot
library(ggplot2)
ggplot(DF, aes(x=x,y=y, color=pointdens)) + geom_point() + scale_colour_gradientn(colours = rainbow(5)) + theme_bw()
ADD COMMENT
• link
updated 5.0 years ago by
Ram
44k
•
written 11.6 years ago by
Irsan
★
7.8k
0
Entering edit mode
@ Irsan: This is what I would like to currently generate for my gene expression data of dimension x=4X15000 and y= 4X15000 to show the correlation between all gene pairs in x and y. Could you please suggest how I should modify my data to obtain a scatterplot of gene expression based on heat density.
My data is initially two dataframes of dimension 15000 X 4 each where the rows are the genes and the columns are the samples. So for these two dataframes, I would like to find the scatterplot of gene correlation density.
Dragging and dropping an image (it looks like that's what you did) won't work. You need to post the image elsewhere and then just link to it.
I have a question, how can you add a label next to a heatscatter (which I am loving as my new fav graph) to show the viewer what the difference in density of points is between two colours? Something along the lines of this:
Red = 10-20 data points overlapping
Blue = 0-10 data points overlapping
Is it even possible, seems like something my PI would like to see, as would I.
I don't know exactly how to do that but maybe the contour function will help. I asked a developer of LSD (Bjoern Schwalb), who actually posts a decent amount here, about what the values along the contour mean (see
add.contour = TRUE
) and he told me "the values shown are density estimates from a 2D Kernel Density Estimator function that is used internally (KDE2D)". For my presentations and recent manuscript I just made a color bar that goes from blue to red with the other colors in between and just said the red was high density and the blue is low density (most people are generally satisfied with that as long as your mention the sample size (i.e. n = 100). The publications I've seen that have used heat scatter have never specified exact numbers of how many data points overlap.You may want to look into hexbin if finding the number of data points overlapping is really important. I think it's supposed to do a good job of performing that task: http://www.statmethods.net/graphs/scatterplot.html (it's under high density scatter plots)
This may also help for future LSD work if you hadn't seen it already: http://cran.fhcrc.org/web/packages/LSD/LSD.pdf
This is awesome! Wish I could save the plots as ggplot objects though