Hi,
this is a follow-up from this Question.
I have created a plot using the ggplot2 package. But as the matrix is very large (almost 146000 rows), the single cells of the image are quite small.
I would like to know how to make the single cell sizes bigger, so I will get a better overview inside the image about the differences between the different cells.
I also would like to know if it is possible to make a bigger (longer) lengend of more than just five elements? I would like to create at least 20 different coloured groups of distinguish different colors
( a logical question - why does it plot it in a triangle?)
This is how I create the plot:
require(ggplot2)
pl1 <- ggplot(data, aes(y = partner1, x = partner2)) + geom_tile(aes(fill = Substract)) + scale_fill_continuous(low = "blue", high = "green") + scale_size(range = c(1, 20000))
Thanks
Assa
By "legend", do you instead mean tick labels? The actual legend is the color bar on the right.
BTW, the answer to your logic question is that your data is in a triangular shape (well,
partner1~partner2
is). This is likely due to how you generated those values.Edit: You can test what I mentioned above regarding shape with
with(data, table(parner1>partner2))
.Actually I do mean the bar at the side of the image :-)
If you just want more colors then try
scale_fill_gradientn()
.I wonder whether making such a heatmap of your data is the best way to vizualize the results in the first place. In the heatmap, genomic positions are converted to non-numeric values which makes it hard to see the relative distances along the genome. Furthermore, I think there is too much data per pixel and the distribution of "substraction-values" is very scewed towards the lower numbers which makes it hard to see the different colors (maybe you should log-transform the substraction-values).
But what do you really want to show to the viewer? The things I can think of are:
In any case, it is very hard to answer with the tile-plot you are trying to make.
In case of 1 and 2, make a karygram overview of the distribution (histogram) of interactions along the chromosomes. Then use
geom_histogram() + facet_wrap(~chr,...)
In case of 3, compure the pairwise correlation coefficients between all positions and make a How Do I Draw A Heatmap In R With Both A Color Key And Multiple Color Side Bars?
It is not really continuous genomic positions, but bins of 1000 (or 5000) positions summarized into one value. And I do have only one chromosome. I thought a heatmap will be better, because of the better (coloured) overview. (Is there a way to do a histograms with different colours for specific values ranges?). I will try the histogram - BTW did you mean karyogram (Is it a GRanges Object?).