One of the most important aspects of Grid Computing is its potential ability to execute distributed communicating jobs. The Distributed Resource Management Application API (DRMAA) specification constitutes a homogeneous interface to different Distributed Resource Management Systems (DRMS) to handle job submission, monitoring and control, and retrieval of finished job status.
In this way, DRMAA could aid scientists and engineers to express their computational problems by providing a portable direct interface to DRMS. There are several projects underway to implement the DRMAA specification on different DRMS, like Sun Grid Engine (SGE), Condor or Torque.
GridWay provides a full-featured native implementation of the DRMAA standard to interface DRMS through the Globus Toolkit. The GridWay DRMAA library can successfully compile and execute the DRMAA test suite (version 1.4.0). Please check the GridWay DRMAA Reference Guide for a complete description of the DRMAA routines.
Although DRMAA could interface with DRMS at different levels, for example at the intranet level with SGE or Condor, with GridWay we will only consider its application at Grid level. In this way, the DRMS (GridWay in our case) will interact with the local job managers (Condor, PBS, SGE…) through the Grid middleware (Globus). This development and execution scheme with DRMAA, GridWay and Globus is depicted in the next figure.
Figure 2. Grid Development Model with DRMAA
DRMAA allows scientists and engineers to express their computational problems in a Grid environment. The capture of the job exit code allows users to define complex jobs, where each depends on the output and exit code from the previous job. They may even involve branching, looping and spawning of subtasks, allowing the exploitation of the parallelism on the workflow of certain type of applications. Let's review some typical scientific application profiles that can benefit from DRMAA.
Applications that can be obviously divided into a number of independent tasks. The application is asynchronous when it requires distinct instruction streams and so different execution times. A sample of this schema with its DRMAA implementation is showed in the following figure.
Figure 3. Embarrassingly Distributed Applications schema
rc = drmaa_init (contact, err); // Execute initial job and wait for it rc = drmaa_run_job(job_id, jt, err); rc = drmaa_wait(job_id, &stat, timeout, rusage, err); // Execute n jobs simultaneously and wait rc = drmaa_run_bulk_jobs(job_ids,jt,1, JOB_NUM,1,err); rc = drmaa_synchronize(job_ids, timeout, 1, err); // Execute final job and wait for it rc = drmaa_run_job(job_id, jt, err); rc = drmaa_wait(job_id,&stat, timeout, rusage, err); rc = drmaa_exit(err_diag);
A Master task assigns a description (input less) of the task to be performed by each Worker. Once all the Workers are completed, the Master task performs some computations in order to evaluate a stop criterion or to assign new tasks to more workers. Again, it could be synchronous or asynchronous. The following figure shows a example of Master-worker optimization loop and a DRMAA implementation sample.
Figure 4. Master-Worker Applications schema
rc = drmaa_init(contact, err_diag); // Execute initial job and wait for it rc = drmaa_run_job(job_id, jt, err_diag); rc = drmaa_wait(job_id, &stat, timeout, rusage, err_diag); while (exitstatus != 0) { // Execute n Workers concurrently and wait rc = drmaa_run_bulk_jobs(job_ids, jt, 1, JOB_NUM, 1, err_diag); rc = drmaa_synchronize(job_ids, timeout, 1, err_diag); // Execute the Master, wait and get exit code rc = drmaa_run_job(job_id, jt, err_diag); rc = drmaa_wait(job_id, &stat, timeout, rusage, err_diag); rc = drmaa_wexitstatus(&exitstatus, stat, err_diag); } rc = drmaa_exit(err_diag);
This is a well known exercise. For our purposes, we will calculate the integral of the following function f(x) = 4/(1+x2). So, π will be the integral of f(x) in the interval [0,1].
In order to calculate the whole integral, it's interesting to divide the function in several tasks and compute its area. The following program computes the area of a set of intervals, assigned to a given task:
#include <stdio.h> #include <string.h> int main (int argc, char** args) { int task_id; int total_tasks; long long int n; long long int i; double l_sum, x, h; task_id = atoi(args[1]); total_tasks = atoi(args[2]); n = atoll(args[3]); fprintf(stderr, "task_id=%d total_tasks=%d n=%lld\n", task_id, total_tasks, n); h = 1.0/n; l_sum = 0.0; for (i = task_id; i < n; i += total_tasks) { x = (i + 0.5)*h; l_sum += 4.0/(1.0 + x*x); } l_sum *= h; printf("%0.12g\n", l_sum); return 0; }
We will use this program (pi) to develop our DRMAA distributed version.
Let us start with the definition of each tasks. As you can see, the previous program needs the number of intervals, total tasks, and the task number. These variables are available to compile job templates through the DRMAA_GW_TASK_ID and DRMAA_GW_TOTAL_TASKS predefined strings.
Also, each task must generate a different standard output file, with its partial result. We can use the standard DRMAA_PLACEHOLDER_INCR predefined string to set up different filenames for each task, so they will not overwrite each others output.
void setup_job_template( drmaa_job_template_t **jt) { char error[DRMAA_ERROR_STRING_BUFFER]; int rc; char cwd[DRMAA_ATTR_BUFFER]; const char *args[4] = {DRMAA_GW_TASK_ID, DRMAA_GW_TOTAL_TASKS, "10000000", NULL}; rc = drmaa_allocate_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER); getcwd(cwd, DRMAA_ATTR_BUFFER) rc = drmaa_set_attribute(*jt, DRMAA_WD, cwd, error, DRMAA_ERROR_STRING_BUFFER); rc = drmaa_set_attribute(*jt, DRMAA_JOB_NAME, "pi.drmaa", error, DRMAA_ERROR_STRING_BUFFER); rc = drmaa_set_attribute(*jt, DRMAA_REMOTE_COMMAND, "pi", error, DRMAA_ERROR_STRING_BUFFER); rc = drmaa_set_vector_attribute(*jt, DRMAA_V_ARGV, args, error, DRMAA_ERROR_STRING_BUFFER); rc = drmaa_set_attribute(*jt, DRMAA_OUTPUT_PATH, "stdout."DRMAA_PLACEHOLDER_INCR, error, DRMAA_ERROR_STRING_BUFFER); }
The DRMAA program just submits a given number of tasks, each one will compute its section of the previous integral. Then we will synchronize these tasks, aggregate the partial results to get the total value. Note that this results are passed to the DRMAA program through the tasks standard output.
int main(int argc, char **argv) { int rc; int end, i; char error[DRMAA_ERROR_STRING_BUFFER]; char value[DRMAA_ATTR_BUFFER]; char attr_buffer[DRMAA_ATTR_BUFFER]; const char *job_ids[1] ={DRMAA_JOB_IDS_SESSION_ALL}; drmaa_job_template_t *jt; drmaa_job_ids_t *jobids; FILE *fp; float pi, pi_t; if ( argc != 2) { fprintf(stderr,"Usage drmaa_pi <number_of_tasks>\n"); return -1; } else end = atoi(argv[1]) - 1; rc = drmaa_init(NULL, error, DRMAA_ERROR_STRING_BUFFER-1); setup_job_template(&jt); rc =drmaa_run_bulk_jobs(&jobids, jt, 0, end, 1, error, DRMAA_ERROR_STRING_BUFFER); fprintf(stderr,"Waiting for bulk job to finish...\n"); rc = drmaa_synchronize(job_ids, DRMAA_TIMEOUT_WAIT_FOREVER, DISPOSE, error, DRMAA_ERROR_STRING_BUFFER); fprintf(stderr,"All Jobs finished\n"); pi = 0.0; for(i=0;i<=end;i++) { snprintf(attr_buffer,DRMAA_ATTR_BUFFER,"stdout.%s",i); fp = fopen(attr_buffer,"r"); fscanf(fp,"%f",&pi_t); fprintf(stderr,"Partial computed by task %i = %1.30f\n",i,pi_t); fclose(fp); pi += pi_t; } drmaa_release_job_ids(jobids); fprintf(stderr,"\nPI=%1.30f\n",pi); drmaa_exit(NULL, 0); return 0; }
This chapter is a tutorial for getting started programming DRMAA applications with GridWay. Although is not necessary that you already know how GridWay works, prior experience submitting and controlling jobs with GridWay will come in handy. This tutorial shows the use of the most important functions in the DRMAA standard, and gives you some hints to use the GridWay DRMAA library.
The source code for the following examples can be found in the “$GW_LOCATION/examples” directory.
You have to include the following line in your C sources to use the GridWay DRMAA C bindings library:
#include "drmaa.h"
Also add the following compiler options to link your program with the DRMAA library:
-L $GW_LOCATION/lib -I $GW_LOCATION/include -ldrmaa
You have to import the GridWay DRMAA JAVA package:
import org.ggf.drmaa.*;
Add the following option to the javac:
-classpath $(CLASSPATH):$GW_LOCATION/lib/drmaa.jar
Also do not forget to update your shared library path:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$GW_LOCATION/lib
Let us start with the initialization steps of every DRMAA program. Before any call to a DRMAA function you must start a DRMAA session. The session is used to manage the jobs submitted in your Grid application. Before ending the program you should disengage from the previously started session, to free the internal session structures.
Also, as you will see every DRMAA function returns an error code that allows you to check whether the call was successful or not (DRMAA_ERRNO_SUCCESS, if everything goes well). Through out this tutorial, for the shake of clarity, we will get rid of error checking code in most cases. But remember that you should check return codes.
The following example shows you how to manage a DRMAA session. It also shows some information about the DRMAA implementation and the DRMS you are using.
char error[DRMAA_ERROR_STRING_BUFFER]; int result; char contact[DRMAA_ATTR_BUFFER]; unsigned int major; unsigned int minor; char drm[DRMAA_ATTR_BUFFER]; char impl[DRMAA_ATTR_BUFFER]; result = drmaa_init (NULL, error, DRMAA_ERROR_STRING_BUFFER-1); (See 1) if ( result != DRMAA_ERRNO_SUCCESS) { fprintf(stderr,"drmaa_init() failed: %s\n", error); return -1; } else printf("drmaa_init() success \n"); drmaa_get_contact(contact, DRMAA_ATTR_BUFFER-1, error, DRMAA_ERROR_STRING_BUFFER-1); (See 2) drmaa_version(&major, &minor, error, DRMAA_ERROR_STRING_BUFFER); drmaa_get_DRM_system(drm, DRMAA_ATTR_BUFFER-1, error, DRMAA_ERROR_STRING_BUFFER-1); drmaa_get_DRMAA_implementation(impl, DRMAA_ATTR_BUFFER-1, error, DRMAA_ERROR_STRING_BUFFER-1); printf("Using %s, details:\n",impl); printf("\t DRMAA version %i.%i\n",major,minor); printf("\t DRMS %s (contact: %s)\n",drm,contact); result = drmaa_exit (error, DRMAA_ERROR_STRING_BUFFER-1); (See 3) if ( result != DRMAA_ERRNO_SUCCESS) { fprintf(stderr,"drmaa_exit() failed: %s\n", error); return -1; } printf("drmaa_exit() success \n");
Using DRMAA for GridWay 4.7, details: DRMAA version 1.0 DRMS GridWay (contact: localhost)
This example shows the same program using the DRMAA JAVA bindings.
import org.ggf.gridway.drmaa.*; import java.util.*; public class Howto1 { public static void main (String[] args) { SessionFactory factory = SessionFactory.getFactory(); Session session = factory.getSession(); try { session.init(null); System.out.println("Session Init success"); System.out.println ("Using " + session.getDRMAAImplementation() + ", details:"); System.out.println ("\t DRMAA version " + session.getVersion()); System.out.println ("\t DRMS " + session.getDRMSInfo() + "(contact: " + session.getContact() + ")"); session.exit(); System.out.println("Session Exit success"); } catch (DrmaaException e) { e.printStackTrace(); } } }
So we already have an active session, how do we use it to submit jobs. The first thing to do is to provide a description of your job. A job is described by its job template (a drmaa_job_template_t variable), which in turns is a structure to store information about your job (things like the executable, its arguments or the output files).
The DRMAA standard provides several pre-defined strings to refer to common job template attributes (those staring with DRMAA_ like DRMAA_REMOTE_COMMAND). The GridWay library also defines some macros to refer to GridWay specific attributes (those staring with DRMAA_GW_ like DRMAA_GW_INPUT_FILES).
There are two kind of attributes: scalar and vector. Scalar attributes are simple strings (char *) and corresponds to template attributes in the form:
attribute = value
You can use the drmaa_set_attribute() and drmaa_get_attr_value() to manage these scalar attributes. On the other hand, vector attributes corresponds to job template variables with one or more values i.e:
attribute = value1 value2 ... valueN
A vector attribute is NULL terminated array of strings (char **) . You can use the drmaa_set_vector_attribute() and drmaa_get_next_attr_value() to deal with vector attributes.
We will use a common job template for the rest of the tutorial, so we will make a function to set up this job template. Remember to check the return codes of the DRMAA functions.
void setup_job_template(drmaa_job_template_t **jt) { char error[DRMAA_ERROR_STRING_BUFFER]; int rc; char cwd[DRMAA_ATTR_BUFFER]; const char *args[3] = {"-l", "-a", NULL}; (See 1) rc = drmaa_allocate_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER-1); (See 2) if ( rc != DRMAA_ERRNO_SUCCESS) { (See 3) fprintf(stderr,"drmaa_allocate_job_template() failed: %s\n", error); exit(-1); } if ( getcwd(cwd, DRMAA_ATTR_BUFFER) == NULL ) { perror("Error getting current working directory"); exit(-1); } rc = drmaa_set_attribute(*jt, (See 4) DRMAA_WD, cwd, error, DRMAA_ERROR_STRING_BUFFER-1); if ( rc != DRMAA_ERRNO_SUCCESS ) { fprintf(stderr,"Error setting job template attribute: %s\n",error); exit(-1); } rc = drmaa_set_attribute(*jt, DRMAA_JOB_NAME, "ht2", error, DRMAA_ERROR_STRING_BUFFER-1); rc = drmaa_set_attribute(*jt, (See 5) DRMAA_REMOTE_COMMAND, "/bin/ls", error, DRMAA_ERROR_STRING_BUFFER-1); rc = drmaa_set_vector_attribute(*jt, (See 6) DRMAA_V_ARGV, args, error, DRMAA_ERROR_STRING_BUFFER-1); if ( rc != DRMAA_ERRNO_SUCCESS ) { fprintf(stderr,"Error setting remote command arguments: %s\n",error); exit(-1); } rc = drmaa_set_attribute(*jt, (See 7) DRMAA_OUTPUT_PATH, "stdout."DRMAA_GW_JOB_ID, error, DRMAA_ERROR_STRING_BUFFER-1); rc = drmaa_set_attribute(*jt, DRMAA_ERROR_PATH, "stderr."DRMAA_GW_JOB_ID, error, DRMAA_ERROR_STRING_BUFFER-1);
If everything went well, the following job template will be generated:
#This file was automatically generated by the GridWay DRMAA library EXECUTABLE=/bin/ls ARGUMENTS= -l -a STDOUT_FILE=stdout.${JOB_ID} STDERR_FILE=stderr.${JOB_ID} RESCHEDULE_ON_FAILURE=no NUMBER_OF_RETRIES=3
This fragment of code shows you how to construct the same template using the DRMAA JAVA bindings.
jt = session.createJobTemplate(); jt.setWorkingDirectory(java.lang.System.getProperty("user.dir")); jt.setJobName("ht2"); jt.setRemoteCommand("/bin/ls"); jt.setArgs(new String[] {"-l","-a"}); jt.setOutputPath("stdout." + SessionImpl.DRMAA_GW_JOB_ID); jt.setErrorPath ("stderr." + SessionImpl.DRMAA_GW_JOB_ID);
We can now submit our “ls” to the Grid. The next example shows you how to submit your job, and how to synchronize its execution. The resource usage made by your job is also shown.
int main(int argc, char *argv[]) { char error[DRMAA_ERROR_STRING_BUFFER]; int result; drmaa_job_template_t * jt; char job_id[DRMAA_JOBNAME_BUFFER]; char job_id_out[DRMAA_JOBNAME_BUFFER]; drmaa_attr_values_t * rusage; int stat; char attr_value[DRMAA_ATTR_BUFFER]; result = drmaa_init (NULL, error, DRMAA_ERROR_STRING_BUFFER-1); if ( result != DRMAA_ERRNO_SUCCESS) { fprintf(stderr,"drmaa_init() failed: %s\n", error); return -1; } setup_job_template(&jt); (See 1) drmaa_run_job(job_id, DRMAA_JOBNAME_BUFFER-1, (See 2) jt, error, DRMAA_ERROR_STRING_BUFFER-1); fprintf(stderr,"Job successfully submitted ID: %s\n",job_id); result = drmaa_wait(job_id, job_id_out, (See 3) DRMAA_JOBNAME_BUFFER-1, &stat, DRMAA_TIMEOUT_WAIT_FOREVER, &rusage, error, DRMAA_ERROR_STRING_BUFFER-1); if ( result != DRMAA_ERRNO_SUCCESS) { fprintf(stderr,"drmaa_wait() failed: %s\n", error); return -1; } drmaa_wexitstatus(&stat,stat,error,DRMAA_ERROR_STRING_BUFFER-1); (See 4) fprintf(stderr,"Job finished with exit code %i, usage: %s\n",stat,job_id); while (drmaa_get_next_attr_value(rusage, attr_value, (See 5) DRMAA_ATTR_BUFFER-1)!= DRMAA_ERRNO_NO_MORE_ELEMENTS ) fprintf(stderr,"\t%s\n",attr_value); drmaa_release_attr_values (rusage); (See 6) drmaa_delete_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER-1); drmaa_exit (error, DRMAA_ERROR_STRING_BUFFER-1); return 0; }
This example shows the same program using the DRMAA JAVA bindings.
try { session.init(null); String id = session.runJob(jt); System.out.println("Job successfully submitted ID: " + id); JobInfo info = session.wait(id, Session.DRMAA_TIMEOUT_WAIT_FOREVER); System.out.println("Job usage:"); Map rmap = info.getResourceUsage(); Iterator r = rmap.keySet().iterator(); while(r.hasNext()) { String name2 = (String) r.next(); String value = (String) rmap.get(name2); System.out.println(" " + name2 + "=" + value); } session.deleteJobTemplate(jt); session.exit(); } catch (DrmaaException e) { e.printStackTrace(); }
But you can do more things with a job than just submit it. The DRMAA standard allows you to control your jobs (kill, hold, release, stop,…) even they are not submitted within a DRMAA session. See the following example:
int main(int argc, char *argv[]) { char error[DRMAA_ERROR_STRING_BUFFER]; int rc; drmaa_job_template_t * jt; char job_id[DRMAA_JOBNAME_BUFFER]; const char *job_ids[2]={DRMAA_JOB_IDS_SESSION_ALL,NULL}; int status; drmaa_init (NULL, error, DRMAA_ERROR_STRING_BUFFER-1); setup_job_template(&jt); drmaa_set_attribute(jt, (See 1) DRMAA_JS_STATE, DRMAA_SUBMISSION_STATE_HOLD, error, DRMAA_ERROR_STRING_BUFFER-1); drmaa_run_job(job_id, DRMAA_JOBNAME_BUFFER, jt, error, DRMAA_ERROR_STRING_BUFFER-1); fprintf(stdout,"Your job has been submitted with id: %s\n", job_id); sleep(5); drmaa_job_ps(job_id, &status, error, DRMAA_ERROR_STRING_BUFFER); (See 2) fprintf(stdout,"Job state is: %s\n",drmaa_gw_strstatus(status)); sleep(1); fprintf(stdout,"Releasing the Job\n"); rc = drmaa_control(job_id, (See 3) DRMAA_CONTROL_RELEASE, error, DRMAA_ERROR_STRING_BUFFER-1); if ( rc != DRMAA_ERRNO_SUCCESS) { fprintf(stderr,"drmaa_control() failed: %s\n", error); return -1; } drmaa_job_ps(job_id, &status, error, DRMAA_ERROR_STRING_BUFFER); fprintf(stdout,"Job state is: %s\n",drmaa_gw_strstatus(status)); fprintf(stdout,"Synchronizing with job...\n"); rc = drmaa_synchronize(job_ids, (See 4) DRMAA_TIMEOUT_WAIT_FOREVER, 0, error, DRMAA_ERROR_STRING_BUFFER-1); if ( rc != DRMAA_ERRNO_SUCCESS) { fprintf(stderr,"drmaa_synchronize failed: %s\n", error); return -1; } fprintf(stdout,"Killing the Job\n"); drmaa_control(job_id, (See 5) DRMAA_CONTROL_TERMINATE, error, DRMAA_ERROR_STRING_BUFFER-1); if ( rc != DRMAA_ERRNO_SUCCESS) { fprintf(stderr,"drmaa_control() failed: %s\n", error); return -1; } fprintf(stdout,"Your job has been deleted\n"); drmaa_delete_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER-1); drmaa_exit (error,DRMAA_ERROR_STRING_BUFFER-1); return 0; }
Let see the same program in JAVA
try { session.init(null); setup_job_template(); String id = session.runJob(jt); System.out.println("Job successfully submitted ID: " + id); try { Thread.sleep(5 * 1000); } catch (InterruptedException e) { // Don't care } printJobStatus(session.getJobProgramStatus(id)); try { Thread.sleep(1000); } catch (InterruptedException e) { // Don't care } System.out.println("Releasing the Job"); session.control(id, Session.DRMAA_CONTROL_RELEASE); printJobStatus(session.getJobProgramStatus(id)); System.out.println("Synchronizing with job..."); session.synchronize( Collections.singletonList(Session.DRMAA_JOB_IDS_SESSION_ALL), Session.DRMAA_TIMEOUT_WAIT_FOREVER, false); System.out.println("Killing the Job"); session.control(id, Session.DRMAA_CONTROL_TERMINATE); session.deleteJobTemplate(jt); session.exit(); } catch (DrmaaException e) { e.printStackTrace(); }
Bulk jobs are a direct way to express parametric computations. A bulk job is a set of independent (and very similar) tasks that use the same job template. You can use the DRMAA_PLACEHOLDER_INCR constat to assign different input/output files for each task. The DRMAA_PLACEHOLDER_INCR is a unique identifier of each job (task) in the bulk (array) job. In the GridWay DRMAA library it corresponds to the ${TASK_ID} parameter.
int main(int argc, char *argv[]) { char error[DRMAA_ERROR_STRING_BUFFER]; int rc; int stat; drmaa_job_template_t * jt; drmaa_attr_values_t * rusage; drmaa_job_ids_t * jobids; char value[DRMAA_ATTR_BUFFER]; const char * job_ids[2] ={DRMAA_JOB_IDS_SESSION_ALL,NULL}; char job_id_out[DRMAA_JOBNAME_BUFFER]; int rcj; drmaa_init (NULL, error, DRMAA_ERROR_STRING_BUFFER-1); setup_job_template(&jt); drmaa_set_attribute(jt, DRMAA_OUTPUT_PATH, "stdout."DRMAA_PLACEHOLDER_INCR, (See 1) error, DRMAA_ERROR_STRING_BUFFER-1); rc = drmaa_run_bulk_jobs(&jobids, jt, 0, (See 2) 4, 1, error, DRMAA_ERROR_STRING_BUFFER-1); if ( rc != DRMAA_ERRNO_SUCCESS) { fprintf(stderr,"drmaa_run_bulk_job() failed: %s\n", error); return -1; } fprintf(stderr,"Bulk job successfully submitted IDs are:\n"); do (See 3) { rc = drmaa_get_next_job_id(jobids, value, DRMAA_ATTR_BUFFER-1); if ( rc == DRMAA_ERRNO_SUCCESS ) fprintf(stderr,"\t%s\n", value); }while (rc != DRMAA_ERRNO_NO_MORE_ELEMENTS); fprintf(stderr,"Waiting for bulk job to finish...\n"); drmaa_synchronize(job_ids, (See 4) DRMAA_TIMEOUT_WAIT_FOREVER, 0, error, DRMAA_ERROR_STRING_BUFFER-1); fprintf(stderr,"All Jobs finished\n"); do { rcj = drmaa_get_next_job_id(jobids, value, DRMAA_ATTR_BUFFER-1); if ( rcj == DRMAA_ERRNO_SUCCESS ) { drmaa_wait(value, (See 5) job_id_out, DRMAA_JOBNAME_BUFFER-1, &stat, DRMAA_TIMEOUT_WAIT_FOREVER, &rusage, error, DRMAA_ERROR_STRING_BUFFER-1); drmaa_wexitstatus(&stat,stat,error,DRMAA_ERROR_STRING_BUFFER-1); fprintf(stderr,"Rusage for task %s (exit code %i)\n", value, stat); do { rc = drmaa_get_next_attr_value(rusage, value, DRMAA_ATTR_BUFFER-1); if ( rc == DRMAA_ERRNO_SUCCESS ) fprintf(stderr,"\t%s\n", value); }while (rc != DRMAA_ERRNO_NO_MORE_ELEMENTS); drmaa_release_attr_values(rusage); } }while (rcj != DRMAA_ERRNO_NO_MORE_ELEMENTS); drmaa_release_job_ids(jobids); drmaa_delete_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER-1); drmaa_exit (error,DRMAA_ERROR_STRING_BUFFER-1); return 0; }
Finally a bulk job in JAVA.
try { session.init(null); int start = 0; int end = 4; int step = 1; int i; String id; java.util.List ids = session.runBulkJobs(jt, start, end, step); java.util.Iterator iter = ids.iterator(); System.out.println("Bulk job successfully submitted IDs are: "); while(iter.hasNext()) { System.out.println("\t" + iter.next()); } session.deleteJobTemplate(jt); session.synchronize( Collections.singletonLsectionist(Session.DRMAA_JOB_IDS_SESSION_ALL), Session.DRMAA_TIMEOUT_WAIT_FOREVER, false); for (int count = start; count <= end;count += step) { JobInfo info = session.wait(Session.DRMAA_JOB_IDS_SESSION_ANY, Session.DRMAA_TIMEOUT_WAIT_FOREVER); System.out.println("Job usage:"); Map rmap = info.getResourceUsage(); Iterator r = rmap.keySet().iterator(); while(r.hasNext()) { String name2 = (String) r.next(); String value = (String) rmap.get(name2); System.out.println(" " + name2 + "=" + value); } } session.exit(); } catch (DrmaaException e) { System.out.println("Error: " + e.getMessage()); }
./configure --enable-debug export MADDEBUG=yes ulimit -c unlimited $GW_LOCATION/var/core.<process_pid>
• Lock file exists
GridWay finishes with the following message when you try to start it:
Error! Lock file <path_to_GridWay>/var/.lock exists.
Be sure that no other GWD is running, then remove the lock file and try again.
• Error in MAD initialization GridWay finishes with the following message, when you try to start it:
Error in Execution MAD prews initialization, exiting. Check path, you have a valid proxy...
Check that you have generated a valid proxy (for example with the grid-proxy-info command). Also, check that the directory “$GW_LOCATION/bin” is in your path, and the executable name of all the MADs is defined in “gwd.conf”.
• Error contacting GWD
Client commands, like gwps, finish with the message:
connect(): Connection refused Could not connect to gwd
Be sure that GWD is running (ex. pgrep -l gwd). If it is running, check that you can connect to GWD (ex. telnet `cat $GW_LOCATION/var/gwd.port`)