blog

The news in our blog

 

1. GSEA Tutorial – Overview

The GSEA Desktop Application Tutorial provides a brief overview of the main features of the GSEA application. It is organized in a series of slides which may be navigated by pressing “Next”, or you may jump to any section of interest using the links to the left. For more detailed information, see theDocumentation page.

the install screen for GSEA

 

2. GSEA Tutorial – Ways to Run GSEA

You can run GSEA in multiple ways:

  1. The GSEA desktop application provides an easy-to-use graphical interface. When you launch the application from the download page of the GSEA web site, as you will do in this tutorial, you are using Java Web Start technology (http://java.sun.com/products/javawebstart/) to download, install, and start the application.
  2. The GSEA .jar file provides command line access to GSEA and allows you to run the GSEA desktop application without being connected to the internet. You can download the .jar file from the download page of the GSEA web site.
  3. R-GSEA makes GSEA available from the R programming environment.
  4. A GSEA GenePattern module makes GSEA available from GenePattern.

 

 

3. GSEA Tutorial – Launching GSEA

To launch GSEA:

  1. Go to the Downloads page.
  2. Register as instructed.
  3. Click the Launch icon to start the GSEA Desktop Application.

When GSEA starts, the main window appears. The main components of the user interface are:

  1. The navigation bar on the left, which provides quick access to common GSEA operations.
  2. The Processes panel in the bottom left corner, which provides information about the status of your analyses.
  3. The main panel on the right, which is used to display diaglogs and results. When you start GSEA, the main panel displays the Home page. As you open new pages, tabs will appear next to the Home tab. To close a page, click the close (X) icon on the tab.

the startup screen for GSEA

4. GSEA Tutorial – Loading Data

Click the Load Data icon in the navigation bar. The Load Data page appears. You use this page to load your data files: expression datasets, phenotype labels (e.g tumor vs normal), gene sets, and chip annotations. Once imported these files are stored in memory and are available to the program for analysis.

GSEA supported data files are simply tab delimited ASCII text files, which have special file extensions that identify them. For example, expression data usually has the extension *.gct, phenotypes *.cls, gene sets *.gmt, and chip annotations *.chip. Click the More on file formats help button to view detailed descriptions of all the data file formats.

load expression, phenotype, and gene set data into GSEA

GSEA provides several ways to load data:

  1. Click the Browse for files button. When the Open window appears, select the file(s) to load and then click the Open button. To select multiple files, use SHIFT-click or CTRL-click.
  2. Click the Load last dataset used button. GSEA loads the data used in the most recent gene set enrichment analysis.
  3. Drag-and-drop the files from a file browser window into the drag-and-drop pane. When the files that you want to load are listed in that pane, click the Load these files button. To remove files from the drag-and-drop pane, click the Clear button.
  4. The Recently Used Files pane contains files that you have used previously. (The first time you start GSEA, this pane is empty.) Double-click a file to load it.

The Object Cache pane lists the data that you have loaded into memory.

5. GSEA Tutorial – Loading the P53 Sample Data

The GSEA web site provides several sample datasets that correspond to results from the GSEA Subramanian & Tamayo PNAS 2005 paper. For the tutorial, you will use the P53 sample data.

To download the P53 sample files:

  1. Go to the Datasets page.
  2. Download the three p53 data files. For each file: right-click on the file, select Save link as and save the file to your local drive.
  3. Confirm that the saved files have a .gct or .cls file extension. If a .txt file extension has been appended, remove it.

p53 sample files

To load the P53 data into GSEA:

  1. Go to the Load Data page of the GSEA application.
  2. Click Browse for files.
  3. Select the three files that you just downloaded.
  4. Click Open.

load sample data into GSEA

6. GSEA Tutorial – Analysis Parameters

Now that you have loaded your data files, you are ready to run the gene set enrichment analysis. Click the Run GSEA icon in the navigation bar. The Run GSEA page displays the parameters for the analysis. There are three categories of parameters:

  1. Required: Essential parameters which you must specify before the analysis can be run.
  2. Basic: Additional parameters with standard defaults. Typically, accepting the defaults is ok. Click Showto see these parameters.
  3. Advanced: Parameters that allow control of several more details of the GSEA algorithm and the java implementation. Typically, these do not need to be changed by most users. Click Show to see these parameters.

For descriptions of the parameters, click the ? help button.

how to use the GSEA parameters

7. GSEA Tutorial – Running the Gene Set Enrichment Analysis

To run the analysis, set the parameters and click the Run button.

how to use the GSEA parameters

  1. Use the drop-down selector to pick the p53_hgu95av2 dataset.
  2. Use the button to pick one or more gene sets. GSEA displays a window that lists gene sets in a number of different tabs. For this example, on the GeneMatrix (from website) tab select the c1.v2.symbols.gmt.
  3. Type in or choose the number of permutations to perform. Typically, you start with a small number (perhaps 5) and, when that successfully completes, try a full set of 1000 permutation. For now, choose 5.
  4. Use the button to pick a phenotype. In this sample data, the two phenotypes are the same (MUT_vs_WT or WT_vs_MUT).
  5. Use the to select the chip annotation file that matches the probe identifiers in your expression dataset. For this example, on the Chips (from website) tab, choose the HG_U95Av2.chip file.
  6. Leave the Collapse dataset to gene symbols parameter set to true. This indicates that you want the probe sets in your dataset collapsed to gene symbols.
  7. Leave the Permutation type parameter set to phenotype.
  8. Click Run to start the analysis.

8. GSEA Tutorial – Keeping Identifiers Consistent Between Platforms

Typically, the gene or probe identifiers in your expression dataset are the probe identifiers for the DNA chip array used to produce the data. When running the gene set enrichment analysis, it is critical that all of your data files use the same gene or probe identifiers. You can either use the probe identifiers native to your expression dataset, or collapse each probe set into a gene vector and use HUGO gene symbols as your identifiers.

When you run the gene set enrichment analysis, the value you choose for the Collapse dataset to gene symbols parameter tells GSEA which identifiers you want to use:

  1. Choose true (default) to have GSEA collapse each probe set in your expression dataset into a single gene vector, which is identified by its HUGO gene symbol. In this case, you are using HUGO gene symbols for the analysis. The gene sets that you use for the analysis must use HUGO gene symbols to identify the genes in the gene sets.
  2. Choose false to use your expression dataset “as is.” In this case, you are using the probe identifiers that are in your expression dataset for the analysis. The gene sets that you use for the analysis must also use these probe identifiers to identify the genes in the gene sets.

Collapsing the probe sets eliminates multiple probes, which can inflate enrichment scores, and facilitates the biological interpretation of the gene set enrichment analysis results. Therefore, the GSEA team recommends leaving the default value for this parameter.

9. GSEA Tutorial – Viewing Program Progress and Results

Use the Processes panel at the lower left corner to view the status of analyses run in this session, including the currently running analysis:

view progress of GSEA algorithm

  1. The blue Running label indicates the currently running analysis. You can click on this label to pause or stop an analysis, as shown in the next slide.
  2. If a red Error appears, click on it for a description of the error. If you need help resolving an error, include this error text in an e-mail message to gsea@broadinstitute.org.
  3. When the analysis completes, click the green Success label to display the results in a web browser. For help interpreting the results, see Interpreting GSEA Results in the GSEA User Guide.
  4. Click the analysis name to view the parameters used in the analysis (a new Run GSEA page appears, which you can use to re-run the analysis).
  5. Click the status bar at the bottom of the window to display the execution log, which shows analysis progress (for example, the number of permutations completed).

10. GSEA Tutorial – Stopping or Pausing a Running Analysis

  1. Click the blue Running label to display the thread control panel.
  2. You can pause the analysis or change the amount of the computer’s processor being used for the analysis.

how to pause a running analysis in GSEA

11. GSEA Tutorial – Running the Leading Edge Analysis

After running a gene set enrichment analysis, you can use the leading edge analysis to examine the genes in the leading edge subsets of selected enriched gene sets. Genes that appear in multiple subsets are more likely to be of interest than those that appear in only one.

To run a leading edge analysis, click the Leading Edge Analysis icon on the GSEA main page. When GSEA displays the Leading Edge Analysis page:

  1. Click the button to select a Gene Set Enrichment Report from the application cache (analyses that you have run).
  2. Click the Load GSEA Results button to display the gene sets that were analyzed in that report.
  3. SHIFT-click or CTRL-click to select the gene sets to analyze. For this example, click the FDR column head to order the gene sets by FDR and select the 11 gene sets with an FDR < .01.
  4. Click the Run leading edge analysis button to start the analysis.
  5. The analysis displays four graphs showing the overlap among the leading edge subsets of the selected gene sets. For help interpreting the results, see Interpreting Leading Edge Analysis Results in the GSEA User Guide.

running a leading edge analysis in GSEA

12. GSEA Tutorial – Browsing MSigDB Gene Sets

The power of the gene set enrichment analysis is a function of how well your gene sets represent meaningful coordinated or concordant gene expression behavior that reflects actual biological processes or states. You are welcome to use curated gene sets from the Molecular Signature Database (MSigDB), which is maintained by the GSEA team.

You can browse the MSigDB from the Molecular Signatures Database page of the GSEA web site or the Browse MSigDB page of the GSEA application. To browse the MSigDB from the GSEA application:

  1. Click the Browse MSigDB icon in the navigation bar. An empty Browse MSigDB page appears.
  2. Click the Load database button to display the latest MSigDB gene sets.

Download gene sets

From this page you can

  1. Use the fields at the top of the page to filter the gene sets displayed in the table.
  2. Select a gene set from the table and right-click to display information about the gene set.
  3. When the table displays the gene sets that you are interested in, export the selected gene sets to a gene set file.

GSEA exports the gene set files to your default output folder (Help>Show GSEA Output Folder). The gene set files are tab-delimited ASCII text files that can be viewed in Excel or NotePad.

13. GSEA Tutorial – Viewing Analysis History

Click the Analysis History icon in the navigation bar to display the Analysis History page, which records and displays analyses that you have run. The left panel lists the reports run in the current session and organizes previously run reports by date. Click on an analysis in the left panel to display information about that analysis in the right panel.

show old parameters, files, and re-run analyses in GSEA

In the right panel of the Analysis History page:

  1. You can view the parameters used in the analysis.
  2. You can choose to re-run an analysis with the exact same set of parameters by clicking the Show in ToolRunner button.
  3. You can choose to automatically load or not load data from the previous analysis (perhaps you are on a different computer or are only interested in the previous parameters to use with different datasets).
  4. You can view files produced by the analysis. Double-click the index.html file to display the analysis results in a web browser.

Note: When you run an analysis, by default, GSEA writes the analysis results to the GSEA output folder (Help>Show GSEA output folder). The Analysis History page is simply a convenient way to browse the reports in this folder.

14. GSEA Tutorial – Sharing Results with Collaborators

Sharing GSEA analysis results with collaborators is easy. Click Help>Show GSEA output folder to display the folder that holds the GSEA reports, navigate to the subfolder for the report that you want to share, zip it up, and send it to your collaborator. All reports and their hyperlinks are preserved.

Alternatively, when you run an analysis, you can have GSEA create the zip for you by setting the Make a zipped file with all reports parameter to true (by default, the parameter is set to false).

Share results with collaborators

15. GSEA Tutorial – Setting Preferences

The Options menu provides several preferences to control the application and algorithm defaults.

One useful preference is the location of your GSEA output folder, which holds all of the analysis results (Help>Show GSEA output folder). By default, the output folder is a subfolder of your GSEA home folder. To change the location of your default output folder, click Options>Preferences. When the Preferences window appears, change the default output folder and click OK.

preferences in GSEA

16. GSEA Tutorial – Creating Data Files for GSEA

The gene set enrichment analysis requires four files: an expression dataset file, phenotype labels file, gene sets file, and chip annotations file. All four files are tab-delimited ASCII text files that can be created and edited using Excel or any text editor.

  1. Expression dataset file: This file contains your expression data: genes/probes, samples, and expression values for each probe in each sample. Your expression data can come from any source (Affymetrix, CDNA 2-color ratio data, and so on). You create an expression data file by converting your expression data into a gct, res, or pcl formatted file. Typically, your expression data is already in a tab-delimited ASCII text file, which can be turned into a gct, res, or pcl formatted file with relatively minor edits.
  2. Phenotype label file: This file lists your phenotype labels and associates each sample in your dataset with a phenotype. You can create this file or have GSEA create it for you (you supply the phenotype information and GSEA creates the appropriate file).
  3. Gene sets file: This file defines the gene sets to be analyzed. You can use the gene sets that are available on the Broad ftp site, export gene sets from the MSigDB, or create your own. If you have gene sets that you want to use, GSEA provides a Chip-to-Chip utility, which converts gene/probe identifiers from one DNA chip platform to another (or to HUGO gene symbols).
  4. Chip annotations file: This file maps probe identifiers to HUGO gene symbols. GSEA uses it to collapse each probe set in your dataset to a single gene vector (if you choose to collapse your dataset) and to annotate the gene set enrichment report. The chip annotations files for common DNA chip platforms are available on the Broad ftp site. If necessary (for example, if you are using custom chips), you can create your own chip annotations file.

For descriptions of all of the GSEA file formats, see Data Formats. For more information about creating the data files, see Preparing Data Files for GSEA in the GSEA User Guide.

17. GSEA Tutorial – Examples from Published GSEA Results

The GSEA web site provides the datasets that correspond to results from the GSEA Subramanian & Tamayo PNAS 2005 paper:

  1. Go to the Downloads page.
  2. Near the bottom of the page, click view datasets.

p53 sample files
Note: Because random number generators (for sample permutation) are different and because different seeds are used, numbers in the reports on the website, or reports run with the sample date, will not precisely match those in the paper. However, the significant sets are identical to published results.

18. GSEA Tutorial – Getting Help for GSEA

As you begin to use GSEA, you can get help in several ways:

  1. Click Help>GSEA documentation to view the Documentation page, which includes the GSEA User Guide and a Frequently Asked Questions (FAQ) page.
  2. Click the Help button, which appears on most GSEA windows, to display context-sensitive help.
  3. If you cannot find the information that you are looking for in the documentation, e-mail us atgsea@broadinstitute.org.

Thanks for taking the time for this Quick Tour of GSEA. If you have questions, comments or suggestions, we’d like to hear them: gsea@broadinstitute.org.

 

Main Page

GSEA Home | Downloads | Molecular Signatures Database | Documentation | Contact
Use the navigation bar on the left to display documentation on GSEA software, MSigDB database or GSEA/MSigDB web site. If you have comments or questions not answered by the FAQ or the User Guide, contact us: gsea@broadinstitute.org.

    When contacting our team with questions about java GSEA programs, please send the following information:

  • your computer’s operation system
  • version of java which you used to run GSEA
  • detailed log transcript from the GSEA session in questionto view the log, click [+] at the bottom of main screen of GSEA java desktop application, copy the text to a separate file and attach it to your request

Where to start

If you are new to GSEA, see the Tutorial for a brief overview of the software. If you have a question, see the FAQ or the User Guide. The User Guide describes how to prepare data files, load data files, run the gene set enrichment analysis, and interpret the results. It also includes instructions for running GSEA from the command line and a Quick Reference section, which describes each window of the GSEA desktop application.

MSigDB gene sets

Current release of the Molecular Signatures Database (v5.0 MSigDB) contains 10,348 gene sets for use with GSEA. The best source of information about the gene sets is the MSigDB web site. In addition, an overview of MSigDB gene set collections can be found here

Please note that gene sets can change or become deprecated in subsequent releases of MSigDB. It is thus important to indicate version of MSigDB to fully reference gene sets used in your study.

Software

We provide the following software implementations of the GSEA method:

  • Java desktop application — Easy-to-use graphical interface that can be run from the Downloads page. The User Guide fully describes this application (referred to as GSEA or GSEA-P).
  • Java jar file — Command line interface that can be downloaded from the Downloads page. See Command Line Running GSEA from the Command Line in the User Guide for details. This might be useful for analyzing several datasets sequentially, analyzing large datasets, or running analyses on a compute cluster.
  • R-GSEA — R implementation of GSEA that can be downloaded from the Downloads page. This implementation is intended for experienced computational biologists who want to tweak and play with algorithm. The R-GSEA Readme provides brief instructions and support is limited. Please note that this implementation has not been actively maintained since 2005.
  • Java source code — Source code and JavaDoc for the Java jar file can be downloaded from the Downloads page. Further information can be foundhere and in the Release Notes.

Thank you for your interest in GSEA,
The GSEA Team

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>