Calclab Logo Grid Computing on the Calclab Systems

Status

gauss.math.tamu.edu

Overview

The Calclab systems have grown over the years to become a very respectable computational resource. As a whole, the 191 desktop systems offer 1.971Tops peak performance and 296GB of total memory and are being made available to SURAgrid users. The systems are in use for classes on weekdays from approximately 8:00 a.m. until 6:00 p.m. Two of the labs are used for help sessions Mon-Thu, 7:00 p.m. - 10:00 p.m., and Sunday 1:00 p.m. - 5:00 p.m. During off hours the systems will be available for computational work. Systems in the labs not being used for help sessions will be made available for grid computing at those times. While it would be nice to make this a 24/7 resource for grid computing using something like Condor to take advantage of unused cycles at any time during the day, we need to keep the cpu fan noise to a minimum when labs are in session.

Software

We are using the TORQUE Resource Manager to schedule batch jobs. Two queues are available: night and weekend for overnight and weekend jobs, respectively. To accomodate the daily Calclab class schedule we use standing reservations in the Maui scheduler to prevent jobs from being scheduled that would overlap with class time. We are also taking the nodes offline so that the Globus MDS reports zero nodes available.

Information on using TORQUE on the Calclab cluster can be found here.

Message passing is handled through Open MPI 1.3 and MPICH 1.2.27 library. The mpirun command has been modified to verify that the MPI application is running in batch mode (interactive mode is not allowed) and to pick up the PBS_NODEFILE environment variable from TORQUE.

For grid software we are using the Globus Toolkit 4.0.7.

Storage

The /data filesystem is available for scratch usage. It is not backed up. When this filesystem starts getting full we will begin to enforce time limits on files. Home directories are typically stored under /u and are subject to quotas.

Environment Variables

The following shell environment variables are defined:

SURAGRID_SHARED_SCRATCH=/data/scratch
SURAGRID_SHARED_PROJECTS=/data/sura/projects
SURAGRID_SCRATCH=/tmp

Access

Access to the Calclab computational resources will be granted to researchers at Texas A&M and its grid partners (SURAgrid and TIGRE) involved in research projects with significant computational requirements. Those interested in using the Calclabs should e-mail Steve Johnson, steve /AT/ math.tamu.edu, including their contact information, a brief description of their research, and the status of their application. Accounts will be active until the end of the current A&M fiscal year, August 31. In general, usage is governed by the Calclab Policies. Of particular importance is the One Login / One Person account policy: a Calclab account is to be used only by the person to whom it was granted. Also applicable is the SURAgrid Acceptable Use Policy. In addition to these policies, we require that all computational work be performed during off-hours, as the labs are first and foremost a resource for lab work to support our classes.

Finally

The idea of putting the Calclabs to use as a computational resource is fairly new. The configuration is likely to evolve. We have a regular schedule for retiring old hardware and replacing it with new, high performance equipment. We hope that the use of Calclabs will produce meaningful results to complex computational problems and that these results will serve as an incentive for the powers-that-be to upgrade the Calclab networking to Gigabit Ethernet.

It's likely that we will continue to evaluate other scheduling software in an effort to make our off-hours restriction as transparent to the end user as possible. Considering that this is a very new resource and as such it may suffer from hardware and software failures. With that in mind, we as that you checkpoint your code if at all possible so that it can be resumed in the event of a interruption.


Last update: 23Mar2009 slj

Valid XHTML 1.1! Valid CSS!