Batch
Processing on the CalclabsLong running batch jobs may be submitted from gauss.math.tamu.edu. These jobs will be scheduled on available desktops in the Calclabs. "Availability" is defined as a block of time greater than 3 hours for which no class is scheduled for a particular room. We are using the TORQUE Resource Manager (based on OpenPBS) for batch scheduling. You may login to gauss using SSH (or GSI-SSH if you have recognized grid credentials). There are no graphical logins available on gauss. There should be NO processing on gauss, as it is for edit/compile/debug, job submission, and data collection purposes only. Any long running processes on gauss will be terminated without warning and your account will be closed if the abuse continues.
Please consider checkpointing your application appropriately as nodes may go down during the running of your job. Also, do not submit jobs in rapid succession. Doing so may cause the scheduler to put all of them onto a single node. Give the scheduler a few seconds between submitting jobs to avoid this condition.
The queues are configured as follows:
| Queue | Time limit | Mem limit |
|---|---|---|
| night | 9 hours | 2.0GB |
| weekend | 42 hours | 2.0GB |
Please check this page for updates to the queue configuration.
Jobs are submitted using the qsub command.
Example (serial job):
qsub -q night myserjob.pbs
The contents of the myserjob.pbs file for a serial (single node) application may look something like this:
#PBS -l walltime=02:00:00
#PBS -q night
cd $HOME/mysubdir
./myprog -j 1 -f outfile.lis << EOT
2
45.5 62
infile.txt
14
EOT
exit 0
In the above example, the #PBS directives are interpreted by qsub command and do not need to be specified on the command line. Here we're setting a walltime limit of 2 hours and using the night queue. When the job starts we change to the $HOME/mysubdir subdirectory and execute myprog in that directory with the command line arguments -j 1 -f outfile.lis. The lines between the two EOT tags contain the input to the program that would normally be read when myprog is executed interactively.
Matlab is only available to Texas A&M University faculty, staff, and students. This example is similar to the one above. We will be running a serial job in which we call Matlab.
#PBS -l walltime=00:01:00
#PBS -q night
#PBS -j oe
matlab -nojvm < $HOME/myfile.m > $HOME/matlab.out
exit 0
In the above code we use #PBS -j oe to combine the script's stdout and stderr into one file. We start matlab with the -nojvm option to prevent loading the Java VM. The input for matlab is read from $HOME/myfile.m and output is stored in $HOME/matlab.out.
For a parallel job, myparajob.pbs the script file would issue the mpirun command as follows:
#PBS -l walltime=00:10:00
#PBS -q pdev
#PBS -l nodes=8
cd $HOME/mysubdir
mpirun -np 8 ./myparaprog -j 1 -f outfile.list << EOT
6.6
infile.txt
-1 20
qfile.out
EOT
exit 0
This job is submitted with:
qsub -q night -l nodes=8 myparajob.pbs
In this example, we set a walltime limit of 10 minutes and request the job
be started from the night queue on 8 nodes.
When TORQUE starts your job, it will allocate a list of nodes to be used.
The list of per-job nodes is in a file pointed to by the $PBS_NODEFILE
environment variable. To see this list of nodes, include the following
command in myparajob.pbs:
cat $PBS_NODEFILE.
You can see the status of your job using the qstat command.
You can cancel your queued jobs ('Q' status from qstat) by using the qdel jobID where jobID is also found from the qstat command. You can send Unix signal 15 to a running job ('R' status) with qsig -s 15 jobID.
You can view the status of the cluster on the Status page.
The system administrators reserve the right to monitor all processes for appropriate use of the resources. Appropriate use is defined as legitimate academic work appropriate for Texas A&M University. Cracking MD5 or running apps such as Folding@Home are not considered legitimate. Abuse of the batch system will result in account termination.