How to create Next-Generation Clustered Heat Maps using Galaxy

This article describes how to create a Next-Generation Clustered Heat Map (NG-CHM) using our Galaxy NG-CHM Builder. Galaxy is a system that enables researchers without informatics expertise to perform computational analyses through the web. We also have a video describing this material.

Before you begin

You will need access to a Galaxy server with the NG-CHM builder and visualization components installed. This article describes how to install them if you have administrative access to a Galaxy server.

Make sure that you are logged in.

Importing data

To create a heat map, you will need the data you want to visualize available as a matrix with row and column labels.

You can import a tab-delimited data matrix file by:

clicking the import button at the top left of the tools menu,
selecting the desired file, and
clicking start to bring it into the system.

When the file turns green, it is ready to be used.

Running the NG-CHM generator tool

Select the NG-CHM heat map creation tool from the menu on the left. This will bring up the settings dialog box.

At the top of the dialog box, select your original data matrix file.

Under that is an input field for the heat map name, which is viewable in the heat map viewer.

Below that there is an input field for an optional, longer description of the heat map. The description will be displayed in a tooltip pop-up on the viewer.

The following selection option is for the data summarization method. For matrices with more than a thousand rows or columns, this approach is used to condense the data for the summary view. Average is generally the best method to use unless the map contains categorical data where mode would be best.

Row and column clustering

Below the data summarization option, the options for rows and columns are separated into two identical subgroups. This enables you to cluster each axis independently.

The ordering method is used to specify how you want your rows and or columns clustered. The other options are to use the order of the data in the original matrix or to randomize them.

If clustering is selected, a distance metric is needed to measure the similarity of any two columns and a clustering method is needed to specify the algorithm for clustering. You can get details on these clustering options from the documentation on the R hclust function. Generally, the Ward clustering method is a good default.

It is easy in Galaxy to rerun the tool and try out other distance measures or clustering methods.

Covariate Bars

Covariate bars are annotations associated with the rows or columns. For example: smoker status, age, or gender. Adding covariate bars to a heat map can provide insights about the members of various clusters in the map. The generator tool allows the addition of dynamic numbers of rows and or column covariate bars.

To add a covariate bar, click on the insert button. The name entered here will display as a covariate name on the heat map. The covariate bar file needs to be loaded into galaxy prior to running the tool. The covariate file is also a tab-delimited file. This file should only have two tab-delimited columns of data. For each row in the file, the first column is used to list the corresponding heat map row label for a row covariate or column label for column covariate. The second column is the numeric value for continuous covariates or a category for the label in column one.

There must be a row in all the covariate files for all the corresponding row labels or column labels in the heat map.

Generating the heat map

Next, click execute and your file will show up in the history on the right.

Displaying the heat map

The file will turn green when the heat map is ready to be viewed. Next, click on the file, then at the bottom of that dialog box click the graph icon. Select the NG-CHM heat map option.

Blog posts, how tos, and updates about interactive next-generation clustered heat maps for the exploration of big data matrices.