Tool:HeatmapGenerator: High performance RNAseq and microarray visualization software suite to examine differential gene expression levels using an R and C++ hybrid computational pipeline
3
13
Entering edit mode
9.9 years ago

Dear Community,

Please take a look at HeatmapGenerator: http://www.scfbm.org/content/9/1/30

HeatmapGenerator is a graphical user interface software program written in C++, R, and OpenGL to create customized gene expression heatmaps from RNA-seq and microarray data in medical research with simple clicks of a button to help you (the researcher) save time and money.

To use HeatmapGenerator, please download either the Apple Mac OSX or Microsoft Windows binaries from here: https://sourceforge.net/projects/heatmapgenerator/

You can easily follow along with me on my YouTube video and see HeatmapGenerator in action:

For interested software developers, HeatmapGenerator source code is available for viewing at: https://github.com/Bohdan-Khomtchouk/HeatmapGenerator

HeatmapGenerator is free software (released under the GNU General Public License) and you are welcome (and encouraged) to contribute to it. Please feel free to get in touch with me at khomtchoukmed@gmail.com, Github, or here on Biostars.

Best regards,
Bohdan Khomtchouk

heatmap microarray R Cpp RNA-Seq • 18k views
ADD COMMENT
0
Entering edit mode

Not that it helps at this stage, but the Rcpp package coupled with the shiny package would have allowed you to create the same style of gui. Here is an example of a web application using both C++ and R code under the hood https://popgen.shinyapps.io/divMigrate-online/.

Edit: Fixed url

ADD REPLY
0
Entering edit mode

Shiny is for web browser tools only, not standalone executable files. Also, rcpp does not have the R gplots package prebundled within it, which contains the heatmap.2 command. By relying on the R script batch interpreter as I have done in http://www.scfbm.org/content/9/1/30, which already includes every necessary package in R, it obviates the need for rcpp.

ADD REPLY
0
Entering edit mode

You can try to set up a local Shiny server. Although you'll need to run a virtual machine on Windows for those purposes (see https://github.com/leondutoit/shiny-server-on-ubuntu).

ADD REPLY
0
Entering edit mode

There is also shinyapps, which just went beta. It is trivial to bundle all dependencies (even github only packages) and upload them to the server.

ADD REPLY
0
Entering edit mode

I didn't realize it was a standalone executable, which is pretty cool. Although, I'm not sure I understand the point about Rcpp not having gplots prebundled with it. Rcpp is just a package that allow the integration of C++ into R commands. Specifically, if you need speed, then you can write a function in C++, Rcpp provides methods for importing this function to the R environment, meaning that it can be called, in much the same way you would call a pure R function, within your existing R script. For example, the app I originally linked to uses multiple pure C++ function, as well as additional packages (e.g. qgraph, diveRsity), which are all just bundled into a 'shinyapp'. This can then be uploaded to shinyapps.io (for free at the moment), where it is hosted and provides a self contained, live web application. Just a nice alternative approach, I think.

ADD REPLY
2
Entering edit mode
9.9 years ago
NHEJ ▴ 360

Giving wet-lab people an opportunity to make heatmaps directly in R without needing to know how to program in R is broadly useful for RNA-seq and microarray research. I've never seen programming languages get synthesized together like that before so I googled for "hybrid computational pipeline" and didn't come up with any more bioinformatics results besides for your paper. As far as I can tell, it's a whole new bioinformatics programming paradigm. As this software development style is likely to be of general interest to computational biologists, may I ask how you went about it and how you accomplished this hybridity (in layman's terms)?

ADD COMMENT
0
Entering edit mode

I'm not sure if it's a new paradigm, but the quick answer to your question would be: OpenGL. OpenGL serves kind of like the "glue" between R and C++. I knew from the beginning that R would be used for making the heatmap and that a fast and efficient GUI for point-and-click action needed to be designed around the R language (so that non-computational people could use the program). Lines 164-231 in the Mac OS X version on my Github account shows the R code, and everything built around it is a mixture of C++ and OpenGL code. After installing HeatmapGenerator on your computer, you can now quickly make any kind of customized heatmap (this is faster than altering and re-running an R script) and you don't need to have programming experience to do so (so you don't need to outsource your data).

ADD REPLY
2
Entering edit mode
9.9 years ago
Irsan ★ 7.8k

Nice one! Does it support:

  1. Clustering of rows and/or columns and visualization of dendrogram(s)?
  2. Manupulating dendrograms like flipping branches and pruning and coloring subbranches?
  3. Color coded annotation of rows and/or columns of categorical, ordinal and nominal variables?
  4. Adding color legends for the color-value mappings of the row/column annotation?
  5. Choosing from brewer pallettes for row/column annotation?
  6. Save as .pdf or .svg so images can be edited easily?

In addition, 3D visualization of PCA would be a nice add-on.

ADD COMMENT
2
Entering edit mode

+1 for good suggestions of future add-ons for the community. I suggest pursuing this order: #1 (very quick, just activate heatmap.2 feature and create widget), #6 (quick add-on, maybe even quicker than #1), #5 (brewer palettes as simple add-on), then everything else. And all these popular R utilities with simple clicks of a button for wet-lab researchers!

ADD REPLY
0
Entering edit mode
also look at the dendroextras library for #3 and look at heatmap.3 for part of #4 and #5
ADD REPLY
1
Entering edit mode
9.9 years ago

All this reminds me of the Quilt plot and BoxPlotR papers. There is basically no problem with drawing a heatmap in R, it takes just several lines of code. heatmap.2also handles all clustering stuff, and RStudio provides a nice GUI. Actually, creating heatmaps using R scripts has lots of benefits:

  • you can easily modify / pre-process the data
  • no need to leave R environment when working with differential expression analysis modules like edgeR
  • you can later re-run your R code, this provides analysis consistency

There are some tricky moments is building advanced heatmaps with R, e.g. creating complex legends, see How Do I Draw A Heatmap In R With Both A Color Key And Multiple Color Side Bars?. So why not focus on implementing this kind of features and extending existing heatmap generation libraries?

As for the "you don't need to have programming experience" thesis, those gene expression tables do not come from nowhere. You either learn how to program and perform all the analysis yourself, or ask the bioinformatician who has performed the analysis to help you with visualization, etc.

ADD COMMENT
2
Entering edit mode

I'll address your answer point-by-point: First, RStudio is not a GUI. It is an integrated development environment (IDE). That is a huge difference for making heatmaps, a GUI is something a non-computational person can use (an IDE is for computational people and, frankly, comparing the two is like comparing apples and oranges). Second, what exactly you were referring to comparing HeatmapGenerator with Quilt plot/BoxPlotR is unknown, have you read these three papers and understand the differences (heatmaps vs. boxplots vs. quilt plots) and their applications as stated in the three papers? Third, drawing heatmaps in R takes much, much more than several lines of code, especially if you want to customize your heatmaps and go back and forth between the various customizations (source: simple Google search). Regarding your comment about the gene expression tables, have you heard of the RNA-seq workflow in Galaxy for non-computational people? For wet-lab people that are analyzing RNA-seq data using Galaxy, HeatmapGenerator is a software package for visualization of their data. Happy reading!

ADD REPLY
0
Entering edit mode

1. In present setting the difference between RStudio (IDE) and HeatmapGenerator (GUI) is in the first case one modifies heatmap.2 parameters with text editor and in the second case one uses ratio buttons, etc. The resulting heatmap window looks the same.

2. I don't think it is wise to divide people into "computational" and "non-computational" ones. It indeed takes some time to learn R basics, yet it leads to a great reproducibility and robustness boost. Lots of my colleagues with purely biological background managed to learn R using RStudio quite fast.

3. As for the "...understand the differences (heatmaps vs. boxplots vs. quilt plots)". First, quilt plot actually turned out to be the same thing as heatmap :) Second, I was referring to taking an R function as is and creating a wrapper for it instead of trying to extend it and create a more simple/flexible R implementation.

4. Actually GUI limits the ability for heatmap customization when compared with R. There is a multitude of parameters / heatmap variants, so this finally leads to a cumbersome interface that should be changed completely when a new nicer version of heatmap function is released.

5. As for the complications with drawing heatmap, in your case this is a 4-liner

require(gplots)
df<-read.table("EXAMPLE.txt",header=T,stringsAsFactor=F,sep="\t",comment="")
row.names(df) <- df[,1]
heatmap.2(as.matrix(df[,2:ncol(df)]))

6. I don't think that RNA-seq workflow in Galaxy is for "non-computational" people. One should spend some time to learn how to tune the parameters:

"There is no such thing (yet) as an automated gearshift in expression analysis. It is all like stick-shift driving in San Francisco. In other words, running this tool with default parameters will probably not give you meaningful results. A way to deal with this is to understand the parameters by carefully reading the documentation and experimenting. Fortunately, Galaxy makes experimenting easy."

I.e. one should become a "computational" person for a while.

Galaxy is a great frameworks because it brings together various genomic tools and allows to organize them into a pipeline using an intuitive workflow creation GUI. It also has smart task scheduling system, format conversion utils, many many more. It is not just a wrapper for a command-line utility that facilitates GUI-based parameter selection.

ADD REPLY
2
Entering edit mode

Mikhail, I think what you're missing here is the point that many a wet-lab expert will not want to write or even want to understand the code snippet you wrote in #5 to make a heatmap. They'll just give it to someone else. Just to get your code off the ground in #5 you need to set your working directory in R, etc before you even start anything. Try explaining that to people who just do Excel and need a heatmap the next day. That's not the point. The point is that here if you even do what you suggested in #1 (modify parameters with a text editor), you are just writing one-off scripts, with no reusability or sustainability, which are critical software development pillars. If you know R then that's great, but not every wet-lab person knows R (to say the least) or has any interest in learning it, or wants to modify one-off scripts each time (particularly if they don't understand how those scripts are put together).

ADD REPLY
0
Entering edit mode

I agree with NHEJ and Bohdan here. I think there is a need for a GUI for visualization of expression heatmaps. However, I also think that HeatmapGenerator in its current form is not useful for biologists. Biologists will soon notice that a lot of their requirements are not met (see my suggestions for improvements on top) and therefore will give it to a bio-informatician anyway. However, if they would have had learnt R, they could do anything they like. But most of the biologists I know don't want to spend time on learning R.

ADD REPLY
1
Entering edit mode

I see your point. I would also suggest to separate data visualization and publication-ready figures produced by R. In order to explore data with heatmap function from R, one needs to be able to do subsetting, normalization, etc. The best way to do this would be to learn some basic R functions and use RStudio. If one wants to visualize heatmaps and explore them (i.e. have the ability to zoom, scroll, etc), why not just log2-transform data and use good old TMEV?

ADD REPLY
0
Entering edit mode

I agree that biologists are not always willing to spend time learning R. However it is indeed better to give the task to someone else if a person doesn't want to get into the details. In case one "...need a heatmap the next day" and has no bioinformatics background, it is better to consult a bioinformatician. Without a thorough understanding of what is going there is high likelihood that the analysis results are going to be flawed.

There is lots of reusability and sustainability for scripts, if those are properly organized. That is the key point. Because they tell exactly what heatmap parameters were used. And it is not quite straightforward to deduce the parameters of heatmap function in the case discussed here. Will you be able to tell what data was exactly used and how it was processed by looking at a set of .png images and a pile of text files after a year?

ADD REPLY

Login before adding your answer.

Traffic: 2488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6