Home About Software Documentation Support Outreach Ecosystem Dev Awards Team & Sponsors

Middleware Access Drivers 5.6

Information manager MAD

In order to provide an abstraction with the monitoring and discovery middleware layer (or Grid Information System), GridWay uses a Middleware Access Driver (MAD) module to discover and monitor hosts. This module provides basic operations with the monitoring and discovery middleware.

The format to send a request to the Information MAD, through its standard input, is:

OPERATION HID HOST ARGS

Where:

  • OPERATION: Can be one of the following:
    • INIT: Initializes the MAD (i.e. INIT - - -).
    • DISCOVER: Discovers hosts (i.e. DISCOVER - - - ).
    • MONITOR: Monitors a host (i.e. MONITOR HID HOST -).
    • FINALIZE: Finalizes the MAD (i.e. FINALIZE - - -).
  • HID if the operation is MONITOR, it is a host identifier, chosen by GridWay. Otherwise it is ignored.
  • HOST: If the operation is MONITOR it specifies the host to monitor. Otherwise it is ignored.

On the other side, the format to receive a response from the MAD, through its standard output, is:

OPERATION HID RESULT INFO

Where:

  • OPERATION: Is the operation specified in the request that originated the response.
  • HID: It is the host identifier, as provided in the submission request.
  • RESULT: It is the result of the operation. Could be SUCCESS or FAILURE.
  • INFO: If RESULT is FAILURE, it contains the cause of failure. Otherwise, if OPERATION is DISCOVER, it contains a list of discovered host, or if OPERATION is MONITOR, it contains a list of host attributes.

Table 2. Attributes that should be defined by the Information MADs.

AttributeDescription
HOSTNAME FQDN (Fully Qualified Domain Name) of the execution host (e.g. “hydrus.dacya.ucm.es”)
ARCH Architecture of the execution host (e.g. “i686”, “alpha”)
OS_NAME Operating System name of the execution host (e.g. “Linux”, “SL”)
OS_VERSION Operating System version of the execution host (e.g. “2.6.9-1.66”, “3”)
CPU_MODEL CPU model of the execution host (e.g. “Intel(R) Pentium(R) 4 CPU 2”, “PIV”)
CPU_MHZ CPU speed in MHz of the execution host
CPU_FREE Percentage of free CPU of the execution host
CPU_SMP CPU SMP size of the execution host
NODECOUNT Total number of nodes of the execution host
SIZE_MEM_MB Total memory size in MB of the execution host
FREE_MEM_MB Free memory in MB of the execution hosts
SIZE_DISK_MB Total disk space in MB of the execution hosts
FREE_DISK_MB Free disk space in MB of the execution hosts
LRMS_NAME Name of local DRM system (job manager) for execution, usually not fork (e.g. “jobmanager-pbs”, “Pbs”, “jobmanager-sge”, “SGE”)
LRMS_TYPE Type of local DRM system for execution (e.g. “PBS”, “SGE”)
QUEUE_NAME[i] Name of queue i (e.g. “default”, “short”, “dteam”)
QUEUE_NODECOUNT[i] Total node count of queue i
QUEUE_FREENODECOUNT[i] Free node count of queue i
QUEUE_MAXTIME[i] Maximum wall time of jobs in queue i
QUEUE_MAXCPUTIME[i] Maximum CPU time of jobs in queue i
QUEUE_MAXCOUNT[i] Maximum count of jobs that can be submitted in one request to queue i
QUEUE_MAXRUNNINGJOBS[i] Maximum number of running jobs in queue i
QUEUE_MAXJOBSINQUEUE[i] Maximum number of queued jobs in queue i
QUEUE_DISPATCHTYPE[i] Dispatch type of queue i (e.g. “batch”, “inmediate”)
QUEUE_PRIORITY[i] Priority of queue i
QUEUE_STATUS[i] Status of queue i (e.g. “active”, “production”)


The information drivers interface to the grid information services to collect the resource attributes. These attributes can be used by the end-user to set requirement and rank expressions (job template), for filtering, prioritizing and selecting the candidate hosts. GridWay can simultaneously use as many Information drivers as needed. For example, GridWay allows you to simultaneously use MDS2 and MDS4 services, so you can also use resources from different Grids at the same time. Drivers for MDS 2 and MDS 4 provide the variables described in Table 2-1. However, the information manager is able to receive from the driver other parameters. The GridWay team has used other information parameters that could be very important to improve application efficiency (HTC apps) and for job migration: BANDWIDTH, LATENCY, SPEC_INT, SPEC_FLOAT…

Using them by hand

You can start a mad by hand, here the example for the LDAP mad used on server gilda-bdii.ct.infn.it restricted to the Production hosts from the gilda Virtual Organization. $ is the prompt, < the line you should type in and > the answer from the MAD.

$ gw_im_mad_egee_ldap -s gilda-bdii.ct.infn.it -q "(GlueCEStateStatus=Production)(GlueCEAccessControlBaseRule=VO:gilda)"
< INIT - - -
> INIT - SUCCESS -
< DISCOVER - - -
> DISCOVER - SUCCESS ce1-egee.srce.hr gilda-01.pd.infn.it dgt01.ui.savba.sk vega-ce.ct.infn.it grid010.ct.infn.it ce.hpc.iit.bme.hu iceage-ce-01.ct.infn.it ce-edu.grid.acad.bg sirius-ce.ct.infn.it dc01.nesc.ed.ac.uk gn0.hpcc.sztaki.hu
< MONITOR - gn0.hpcc.sztaki.hu -
> MONITOR - SUCCESS HOSTNAME="gn0.hpcc.sztaki.hu" ARCH="i686" NODECOUNT=16 LRMS_NAME="jobmanager-lcgpbs" LRMS_TYPE="torque" OS_NAME="ScientificSL" OS_VERSION="Beryllium" CPU_MODEL="PentiumD" CPU_MHZ=3000 CPU_SMP=2 FREE_MEM_MB=1024 SIZE_MEM_MB=1024 QUEUE_NAME[0]="gilda" QUEUE_NODECOUNT[0]=16 QUEUE_FREENODECOUNT[0]=16 QUEUE_MAXTIME[0]=4320 QUEUE_MAXCPUTIME[0]=2880 QUEUE_MAXJOBSINQUEUE[0]=999999999 QUEUE_MAXRUNNINGJOBS[0]=999999999 QUEUE_STATUS[0]="Production" QUEUE_DISPATCHTYPE[0]="batch" QUEUE_PRIORITY[0]="1" QUEUE_JOBWAIT[0]="0" QUEUE_ACCESS[0]=":gilda:" 
< FINALIZE - - -
> FINALIZE - SUCCESS -

Execution manager MAD

In order to provide an abstraction with the resource management middleware layer, GridWay uses a Middleware Access Driver (MAD) module to submit, control and monitor the execution of jobs. This module provides basic operations with the resource management middleware.

The format to send a request to the Execution MAD, through its standard input, is:

OPERATION JID HOST/JM RSL

Where:

  • OPERATION: Can be one of the following:
    • INIT: Initializes the MAD.
    • SUBMIT: Submits a job.
    • POLL: Polls a job to obtain its state.
    • CANCEL: Cancels a job.
    • FINALIZE: Finalizes the MAD.
  • JID: Is a job identifier, chosen by GridWay.
  • HOST: If the operation is SUBMIT, it specifies the resource contact to submit the job. Otherwise it is ignored.
  • JM: If the operation is SUBMIT, it specifies the job manager to submit the job. Otherwise it is ignored.
  • RSL: If the operation is SUBMIT, it specifies the resource specification to submit the job. Otherwise it is ignored.

On the other side, the format to receive a response from the MAD, through its standard output, is:

OPERATION JID RESULT INFO

Where:

  • OPERATION: Is the operation specified in the request that originated the response or CALLBACK, in the case of an asynchronous notification of a state change.
  • JID: It is the job identifier, as provided in the submission request.
  • RESULT: It is the result of the operation. Could be SUCCESS or FAILURE.
  • INFO: If RESULT is FAILURE, it contains the cause of failure. Otherwise, if OPERATION is POLL or CALLBACK, it contains the state of the job.

Using them by hand

In this example we opened on another terminal a globus-gass-server at host ui-egee.dacya.ucm.es, port 34069,have at /tmp/sleep.rsl a test file with the appropriate rsl description the executable.

&(executable="/bin/sleep")(arguments="50")(stdout="https://ui-egee.dacya.ucm.es:34069//tmp/sleep.out")(stderr="https://ui-egee.dacya.ucm.es:34069//tmp/sleep.err")(environment=(GW_HOSTNAME "gilda-ce.rediris.es")(GW_USER "gwuser")(GW_JOB_ID 1)(GW_TASK_ID 0)(GW_ARRAY_ID -1)(GW_TOTAL_TASKS 0)(GW_RESTARTED 0))(queue="gilda")

. Also we have valid credentials for submitting into lcgce0.shef.ac.uk/jobmanager-lcgpbs:

$ gw_em_mad_prews
< INIT - - -
> INIT - SUCCESS -
< SUBMIT 1 gilda-ce.rediris.es/jobmanager-lcgpbs /tmp/job.rsl
> SUBMIT 1 SUCCESS https://gilda-ce.rediris.es:20008/22945/1266605386/
 (some time after)
< POLL 1  - -
> POLL 1 SUCCESS PENDING
< POLL 1  - -
> POLL 1 SUCCESS ACTIVE
 (50 seconds later)
> TIMER - SUCCESS Credential is valid until Fri Oct 16 00:15:36 2009
< FINALIZE - - -
> FINALIZE - SUCCESS -

Transfer manager MAD

In order to provide an abstraction with the file transfer management middleware layer, GridWay uses a Middleware Access Driver (MAD) module to transfer job files. This module provides basic operations with the file transfer middleware.

The format to send a request to the Transfer MAD, through its standard input, is:

OPERATION JID TID EXE_MODE SRC_URL DST_URL

Where:

  • OPERATION: Can be one of the following:
  • INIT: Initializes the MAD, JID should be max number of jobs.
  • START: Init transfer associated with job JID
  • END: Finish transfer associated with job JID
  • MKDIR: Creates directory SRC_URL
  • RMDIR: Removes directory SRC_URL
  • CP: start a copy of SRC_URL to DST_URL, with identification TID, and associated with job JID.
  • FINALIZE: Finalizes the MAD.
  • JID: Is a job identifier, chosen by GridWay.
  • TID: Transfer identifier, only relevant for command CP.
  • EXE_MODE: If equal to 'X' file will be given execution permissions, only relevant for command CP.

On the other side, the format to receive a response from the MAD, through its standard output, is:

OPERATION JID TID RESULT INFO

Where:

  • OPERATION: Is the operation specified in the request that originated the response or CALLBACK, in the case of an asynchronous notification of a state change.
  • JID: It is the job identifier, as provided in the START request.
  • TID: It is the transfer identifier, as provided in the CP request.
  • RESULT: It is the result of the operation. Could be SUCCESS or FAILURE.
  • INFO: If RESULT is FAILURE, it contains the cause of failure.
Admin · Log In