Overview of GridEngine
Sun Grid Engine is a Load Management System
(LMS) that allocates resources such as processors (CPU's),
memory, disk-space, and computing time. Grid Engine like
other LMS's enables transparent load sharing, controls the
sharing of resources, and also implements utilization and
site policies.
It has many characteristics including batch
queuing and load balancing, as well as giving
the users the ability to suspend/resume jobs and check the
status of their jobs.
Additional information about Grid Engine may be found in the
general HPCVL SGE
FAQ. This document provides further information that is specific
to the HPCVL-Carleton Linux cluster.
Submitting MPI Jobs
You provide a job to gridengine, and it will run it for you when
compute nodes are available. A job is set up in an sge script file.
This file contains the configuration options for Gridengine, and the
shell script commands that you want to run.
To submit an MPI job to Gridengine, use the qsub -pe mpi 4
example.sge command (using 4 processors in this example),
providing your sge script file, such as example.sge.
Command-line options to qsub may either be directly passed to qsub
or stored in your SGE script file. The following are a list of common
options and their meaning:
- -pe mpi 8 would run the job in the MPI parallel
environment, using 8 processors. The parallel environment tells
GridEngine the type of parallel job that you are running.
- -cwd tells gridengine to change directory to the
current directory before running your program.
- -M your@email.com -m b,e,a,s will send mail to
your@email.com when the following events occur: job
beggining, job ending, job aborted, job
suspended.
Please note that Carleton blocks most external email, so you can
only use this option with Carleton email addresses.
- -V [VAR]=[VAL] defines the environment
variable [VAR] with the value [VAL].
For example, qsub -V LD_LIBRARY_PATH=/usr/local/lib
... is similar to the bash command export
LD_LIBRARY_PATH=/usr/local/lib.
Submitting Hybrid (MPI/OpenMP) Jobs
You are able to run hybrid programs that use MPI and OpenMP together.
To run such programs:
- Use the mpicc4 program to compile your hybrid
OpenMP/MPI code
- Make sure you use the -l blocknode=1 flag, either on
your qsub command line or in your sge script file.
A sample OpenMP/MPI program is
available to demonstrate. Here are the steps to try out this example
program:
-
wget http://people.scs.carleton.ca/~hpcvl/openmp_mpi_example.tar.gz
-
tar zxvf openmp_mpi_example.tar.gz
-
cd openmp
-
make
-
qsub -pe mpi 4 example2.sge
Monitoring GridEngine
You may monitor gridengine using:
- qstat to show brief info on current jobs.
- qstat -f will provide details on each compute node
- qstat -j [jobid] for an sge job with id
[jobid] will give details on that job.
Controlling GridEngine Jobs
-
qdel [jobid] will abort an unfinished job.
-
qdel -u [username] will abort all jobs submited by a particular user.
Troubleshooting
If you have any problems running GridEngine jobs on the Carleton
cluster, please contact HPCVL-Carleton
staff
Further Reading
The primary documentation for using the GridEngine sytem is the
User
Manual [docs.sun.com].
|