HPCVL-Carleton GridEngine Guide

Overview of GridEngine

Sun Grid Engine is a Load Management System (LMS) that allocates resources such as processors (CPU's), memory, disk-space, and computing time. Grid Engine like other LMS's enables transparent load sharing, controls the sharing of resources, and also implements utilization and site policies.

It has many characteristics including batch queuing and load balancing, as well as giving the users the ability to suspend/resume jobs and check the status of their jobs.

Additional information about Grid Engine may be found in the general HPCVL SGE FAQ. This document provides further information that is specific to the HPCVL-Carleton Linux cluster.

Submitting MPI Jobs

You provide a job to gridengine, and it will run it for you when compute nodes are available. A job is set up in an sge script file. This file contains the configuration options for Gridengine, and the shell script commands that you want to run.

To submit an MPI job to Gridengine, use the qsub -pe mpi 4 example.sge command (using 4 processors in this example), providing your sge script file, such as example.sge. Command-line options to qsub may either be directly passed to qsub or stored in your SGE script file. The following are a list of common options and their meaning:

  • -pe mpi 8 would run the job in the MPI parallel environment, using 8 processors. The parallel environment tells GridEngine the type of parallel job that you are running.
  • -cwd tells gridengine to change directory to the current directory before running your program.
  • -M your@email.com -m b,e,a,s will send mail to your@email.com when the following events occur: job beggining, job ending, job aborted, job suspended.
    Please note that Carleton blocks most external email, so you can only use this option with Carleton email addresses.
  • -V [VAR]=[VAL] defines the environment variable [VAR] with the value [VAL]. For example, qsub -V LD_LIBRARY_PATH=/usr/local/lib ... is similar to the bash command export LD_LIBRARY_PATH=/usr/local/lib.

Submitting Hybrid (MPI/OpenMP) Jobs

You are able to run hybrid programs that use MPI and OpenMP together. To run such programs:

  • Use the mpicc4 program to compile your hybrid OpenMP/MPI code
  • Make sure you use the -l blocknode=1 flag, either on your qsub command line or in your sge script file.
A sample OpenMP/MPI program is available to demonstrate. Here are the steps to try out this example program:
  1. wget http://people.scs.carleton.ca/~hpcvl/openmp_mpi_example.tar.gz
  2. tar zxvf openmp_mpi_example.tar.gz
  3. cd openmp
  4. make
  5. qsub -pe mpi 4 example2.sge

Monitoring GridEngine

You may monitor gridengine using:

  • qstat to show brief info on current jobs.
  • qstat -f will provide details on each compute node
  • qstat -j [jobid] for an sge job with id [jobid] will give details on that job.

Controlling GridEngine Jobs

  • qdel [jobid] will abort an unfinished job.
  • qdel -u [username] will abort all jobs submited by a particular user.

Troubleshooting

If you have any problems running GridEngine jobs on the Carleton cluster, please contact HPCVL-Carleton staff

Further Reading

The primary documentation for using the GridEngine sytem is the User Manual [docs.sun.com].

 
  © HPCVL 2012
Last updated on Wednesday, 16-May-2012 11:01:57 EDT