Calclab Logo Running ADCIRC on the Calclabs

Overview

This page contains my notes for getting the ADCIRC application running on the Calclab SURAgrid resource at Texas A&M University, under SuSE 10.1 32bit. These notes are far from complete, but hopefully will provide some pointers to those attempting to get ADCIRC running at their site.

The requirements for running the grid version of ADCIRC are listed here. For us, we had to do the following:

Install MPICH-1.2.x

Nothing real new here. Just grab a copy from Argonne's MPICH site and build it according to the instructions. We built it with the 32-bit GNU compilers. You'll probably need to add the MPICH/bin directory to your PATH. We do this in /etc/profile.d/mpich.sh under SuSE Linux.

Install a Batch System

Again, nothing really special here. If you're using LSF or SGE, you might as well stop reading here, because the rest of this document is pretty specific to PBS. We use the open source version, TORQUE. A standard installation should do the trick. Setup the queues to your taste. For us, we have "night" and "weekend" queues. There are some limitations for us regarding how the scheduler works. We want all jobs in the "night" queue to end by 7:30 a.m. Central time. This option isn't really available in the default PBS scheduler, so it's something we'll have to keep looking at.

Install Globus-4.0.x

Install Globus from your favorite site. Regardless of whether you compile from source or install a binary distribution, you'll need the pre-WS GRAM and MDS. I compiled 4.0.3 from source with node problems. We had some issues with our compiled version of Globus. Under our previous OS, the 4.0.1 slapd binary which is used for the pre-WS MDS did not link correctly and gave a run time error pertaining to a thread problem. I also had a VDT-1.3.9 installation which included a working slapd, so I setup the LD_LIBRARY_PATH for this slapd to use the VDT libraries and this provided a suitable workaround. YMMV.

Install PBS Utilities for Globus

The trickiest part is installing the utilities for Globus to communicate with PBS. When the ADCIRC application is downloaded to our resource, it comes in as a shar file and is submitted to our default PBS queue, "night". ADCIRC expects N nodes to be setup when the job is submitted. The shar file is run on the first node of the job, and extracts the input data and ultimately calls mpirun to execute the ADCIRC MPI binary. The default installation of MPICH will install a mpirun run script that will only run on one node if the -np N option is not present. We modified the mpirun.args script that will set the value of np to default to the number of lines in the file pointed to by the PBS_NODEFILE environment variable. If this variable is not set, then it will revert to its normal behavior of np=1.

We made the following modification at line 8 of the mpirun.args script:

BEFORE

np=1

AFTER

if [ -n "$PBS_NODEFILE" ]; then
  np=`wc -l $PBS_NODEFILE | awk '{ print $1 }'`
else
  np=1
fi

We also call a small local script to enforce our policy of only permitting MPI applications to run from from a batch job. We do not want the head node to be participating in these apps and becoming overloaded. To call this script, we made a slight modification to the main mpirun script at line 79, after $MPIRUN_HOME/mpirun.args has been sourced:

ADDED

[ -f $MPIRUN_HOME/mpirun.local ] && . $MPIRUN_HOME/mpirun.local

Our mpirun.local script is pretty simple:

mpirun.local

# mpirun.local
# Sourced by mpirun to pick up local modifications before the
# version-specific mpirun (e.g., mpirun.ch_p4) is called.
#
if [ -z "$PBS_NODEFILE" ]; then
  echo "mpirun must be run from an OpenPBS job." 1>&2
  exit 1
fi

machineFile="$PBS_NODEFILE"
export machineFile

By default the PBS JobManager for Globus doesn't quite do what you expect it to do. The job submission is controlled via a Perl module, $GLOBUS_LOCATION/lib/perl/Globus/GRAM/JobManager/pbs.pm. There are some changes that need to be made to this script.

Environment Variables

Edit /etc/xinetd.d/gsiftp and add the following definitions:

        env             += GLOBUS_LOCATION=/usr/local/globus-4.0.1
        env             += LD_LIBRARY_PATH=/usr/local/globus-4.0.1/lib
        env             += GLOBUS_TCP_PORT_RANGE=30000,30100

Edit /etc/init.d/globus-gatekeeper and /etc/init.d/globus-gris (or whatever your startup scripts are named) and add the following lines near the top of the files:

	GLOBUS_TCP_PORT_RANGE=30000,31000
	GLOBUS_LOCATION=/usr/local/globus-4.0.1
	export GLOBUS_LOCATION GLOBUS_TCP_PORT_RANGE

Setup the user environment for /bin/{sh,ash,ksh,bash} by creating /etc/profile.d/globus.sh:

	GLOBUS_TCP_PORT_RANGE=30000,31000
	GLOBUS_LOCATION=/usr/local/globus-4.0.1
	export GLOBUS_LOCATION GLOBUS_TCP_PORT_RANGE
	. $GLOBUS_LOCATION/etc/globus-user-env.sh

Similarly, setup the user environment for /bin/{csh,tcsh} by creating /etc/profile.d/globus.csh:

	setenv GLOBUS_TCP_PORT_RANGE 30000,31000
	setenv GLOBUS_LOCATION /usr/local/globus-4.0.1
	source $GLOBUS_LOCATION/etc/globus-user-env.sh

PBS reporting to MDS

Your Globus installation should be compiled with the --enable-prewsmds flag to the ./configure script. This will build the core utilities for running the pre-WS MDS, which is basically an OpenLDAP server. The notes below are from a Globus-4.0.1 source installation under SuSE 9.3 x86_64.

Define PBS_HOME to be the location of the PBS spool directory:
export PBS_HOME=/usr/spool/PBS

Download the GRAM Reporter Scheduler Support files from the Globus 2.4.3 downloads page. Specifically, you'll need globus_gram_reporter-2.0.tar.gz and globus_gram_reporter_setup_pbs-1.0.tar.gz.

Untar both packages.

Build and install the base gram reporter:
cd globus_gram_reporter-2.0
./configure --with-flavor=gcc64dbg # your flavor may be differrent
make
make install

Build the PBS component of the reporter:
cd ../globus_gram_reporter_setup_pbs-1.0
./configure
make
make install

Now we're ready to setup the reporter.
cd $GLOBUS_LOCATION/setup/globus
./setup-globus-mds-gris
./setup-globus-gram-reporter-pbs

We're going to run the GRIS slapd as the user, daemon, so we need to change permissions on some files and directories:
chmod 1777 $GLOBUS_LOCATION/var
touch $GLOBUS_LOCATION/var/{jobs,grid-info-system.log,grid-info-cpu-cache.sh}
chown daemon:daemon $GLOBUS_LOCATION/var/{openldap-ldbm,jobs,grid-info-system.log,grid-info-cpu-cache.sh}

Take a look at $GLOBUS_LOCATION/etc/globus-job-manager.conf. If the line for -globus-gatekeeper-subject includes "unavailable at time of install", then set it to the subject of your host cert. E.g., "/C=US/ST=Texas/L=...".

Edit $GLOBUS_LOCATION/libexec/globus-script-pbs-queue. This file has a syntax error at line 52 (missing '{').
Change
cut=$GLOBUS_SH_CUT-cut}
to
cut=${GLOBUS_SH_CUT-cut}

Edit $GLOBUS_LOCATION/etc/grid-info-slapd.conf and change the timelimit value from 3600 to 30.

Edit $GLOBUS_LOCATION/libexec/grid-info-mds-core. You need to put a timeout on the ldapsearch at or near line 51. If you don't do this, the slapd process will hang and not respond to requests.
Change
ldapsearch -x -h ${GRID_INFO_HOST} ...
to
ldapsearch -I 10 -x -h ${GRID_INFO_HOST} ...

Edit $GLOBUS_LOCATION/etc/grid-info-resource-ldif.conf. This is an LDIF file and controls what information is displayed by the LDAP server. I prefer to keep my filesystem and network info out of public view. Simply comment out the sections you don't want displayed to the outside world. If your reporter installation ran correctly, you should have a couple sections at the bottom of this file related to gram reporting.

Finally, as root, startup the GRIS:
/etc/init.d/globus-gris start
Here's my script for doing this: globus-gris.init. Yes this file can probably use some work. There's also a sample startup script at $GLOBUS_LOCATION/setup/globus/SXXgris.

Look for processes owned by daemon: ps wU daemon. You should see a slapd process running along with a couple of grid-info-soft-register scripts, and probably a sleep process.

Test your installation:
grid-info-search -x -h localhost
In particular, look for a line that reads Mds-Computer-Total-Free-nodeCount: N. This is used by the ADCIRC app when selecting a resource on which to run.

If you have OpenLDAP installed as part of your OS on another system, but do not have Globus installed, you can still check the connectivity to MDS (i.e., to check the host and possibly a campus/dept firewall blocking the port) by using ldapsearch.
ldapsearch -x -p 2135 -h FQDNofGlobusHost \
-b "Mds-Software-deployment=MDS, Mds-Host-hn=FQDNofGlobusHost,Mds-Vo-name=local,o=grid" \
Mds-Service-admin-contact

Finally ...

These notes are definitely a work in progress. Obviously, getting the MDS part to work is the biggest challenge. If you have anything to add, or if I got something wrong, please e-mail me, steve //AT// math.tamu.edu.


Last update: 08Dec2006 slj

Copyright © 2007. Steve Johnson, Texas A&M University. All rights reserved.

Valid XHTML 1.1! Valid CSS!