In order to provide an abstraction with the monitoring and discovery middleware layer (or Grid Information System), GridWay uses a Middleware Access Driver (MAD) module to discover and monitor hosts. This module provides basic operations with the monitoring and discovery middleware.
The format to send a request to the Information MAD, through its standard input, is:
OPERATION HID HOST ARGS
Where:
INIT - - -
).DISCOVER - - -
).MONITOR HID HOST -
).FINALIZE - - -
).On the other side, the format to receive a response from the MAD, through its standard output, is:
OPERATION HID RESULT INFO
Where:
Table 2. Attributes that should be defined by the Information MADs.
Attribute | Description |
---|---|
HOSTNAME | FQDN (Fully Qualified Domain Name) of the execution host (e.g. “hydrus.dacya.ucm.es”) |
ARCH | Architecture of the execution host (e.g. “i686”, “alpha”) |
OS_NAME | Operating System name of the execution host (e.g. “Linux”, “SL”) |
OS_VERSION | Operating System version of the execution host (e.g. “2.6.9-1.66”, “3”) |
CPU_MODEL | CPU model of the execution host (e.g. “Intel(R) Pentium(R) 4 CPU 2”, “PIV”) |
CPU_MHZ | CPU speed in MHz of the execution host |
CPU_FREE | Percentage of free CPU of the execution host |
CPU_SMP | CPU SMP size of the execution host |
NODECOUNT | Total number of nodes of the execution host |
SIZE_MEM_MB | Total memory size in MB of the execution host |
FREE_MEM_MB | Free memory in MB of the execution hosts |
SIZE_DISK_MB | Total disk space in MB of the execution hosts |
FREE_DISK_MB | Free disk space in MB of the execution hosts |
LRMS_NAME | Name of local DRM system (job manager) for execution, usually not fork (e.g. “jobmanager-pbs”, “Pbs”, “jobmanager-sge”, “SGE”) |
LRMS_TYPE | Type of local DRM system for execution (e.g. “PBS”, “SGE”) |
QUEUE_NAME[i] | Name of queue i (e.g. “default”, “short”, “dteam”) |
QUEUE_NODECOUNT[i] | Total node count of queue i |
QUEUE_FREENODECOUNT[i] | Free node count of queue i |
QUEUE_MAXTIME[i] | Maximum wall time of jobs in queue i |
QUEUE_MAXCPUTIME[i] | Maximum CPU time of jobs in queue i |
QUEUE_MAXCOUNT[i] | Maximum count of jobs that can be submitted in one request to queue i |
QUEUE_MAXRUNNINGJOBS[i] | Maximum number of running jobs in queue i |
QUEUE_MAXJOBSINQUEUE[i] | Maximum number of queued jobs in queue i |
QUEUE_DISPATCHTYPE[i] | Dispatch type of queue i (e.g. “batch”, “inmediate”) |
QUEUE_PRIORITY[i] | Priority of queue i |
QUEUE_STATUS[i] | Status of queue i (e.g. “active”, “production”) |
The information drivers interface to the grid information services to collect the resource attributes. These attributes can be used by the end-user to set requirement and rank expressions (job template), for filtering, prioritizing and selecting the candidate hosts. GridWay can simultaneously use as many Information drivers as needed. For example, GridWay allows you to simultaneously use MDS2 and MDS4 services, so you can also use resources from different Grids at the same time. Drivers for MDS 2 and MDS 4 provide the variables described in Table 2-1. However, the information manager is able to receive from the driver other parameters. The GridWay team has used other information parameters that could be very important to improve application efficiency (HTC apps) and for job migration: BANDWIDTH, LATENCY, SPEC_INT, SPEC_FLOAT…
You can start a mad by hand, here the example for the LDAP mad used on server gilda-bdii.ct.infn.it
restricted to the Production
hosts from the gilda
Virtual Organization. $
is the prompt, <
the line you should type in and >
the answer from the MAD.
$ gw_im_mad_egee_ldap -s gilda-bdii.ct.infn.it -q "(GlueCEStateStatus=Production)(GlueCEAccessControlBaseRule=VO:gilda)" < INIT - - - > INIT - SUCCESS - < DISCOVER - - - > DISCOVER - SUCCESS ce1-egee.srce.hr gilda-01.pd.infn.it dgt01.ui.savba.sk vega-ce.ct.infn.it grid010.ct.infn.it ce.hpc.iit.bme.hu iceage-ce-01.ct.infn.it ce-edu.grid.acad.bg sirius-ce.ct.infn.it dc01.nesc.ed.ac.uk gn0.hpcc.sztaki.hu < MONITOR - gn0.hpcc.sztaki.hu - > MONITOR - SUCCESS HOSTNAME="gn0.hpcc.sztaki.hu" ARCH="i686" NODECOUNT=16 LRMS_NAME="jobmanager-lcgpbs" LRMS_TYPE="torque" OS_NAME="ScientificSL" OS_VERSION="Beryllium" CPU_MODEL="PentiumD" CPU_MHZ=3000 CPU_SMP=2 FREE_MEM_MB=1024 SIZE_MEM_MB=1024 QUEUE_NAME[0]="gilda" QUEUE_NODECOUNT[0]=16 QUEUE_FREENODECOUNT[0]=16 QUEUE_MAXTIME[0]=4320 QUEUE_MAXCPUTIME[0]=2880 QUEUE_MAXJOBSINQUEUE[0]=999999999 QUEUE_MAXRUNNINGJOBS[0]=999999999 QUEUE_STATUS[0]="Production" QUEUE_DISPATCHTYPE[0]="batch" QUEUE_PRIORITY[0]="1" QUEUE_JOBWAIT[0]="0" QUEUE_ACCESS[0]=":gilda:" < FINALIZE - - - > FINALIZE - SUCCESS -
In order to provide an abstraction with the resource management middleware layer, GridWay uses a Middleware Access Driver (MAD) module to submit, control and monitor the execution of jobs. This module provides basic operations with the resource management middleware.
The format to send a request to the Execution MAD, through its standard input, is:
OPERATION JID HOST/JM RSL
Where:
On the other side, the format to receive a response from the MAD, through its standard output, is:
OPERATION JID RESULT INFO
Where:
In this example we opened on another terminal a globus-gass-server at host ui-egee.dacya.ucm.es, port 34069,have at /tmp/sleep.rsl
a test file with the appropriate rsl description the executable.
&(executable="/bin/sleep")(arguments="50")(stdout="https://ui-egee.dacya.ucm.es:34069//tmp/sleep.out")(stderr="https://ui-egee.dacya.ucm.es:34069//tmp/sleep.err")(environment=(GW_HOSTNAME "gilda-ce.rediris.es")(GW_USER "gwuser")(GW_JOB_ID 1)(GW_TASK_ID 0)(GW_ARRAY_ID -1)(GW_TOTAL_TASKS 0)(GW_RESTARTED 0))(queue="gilda")
. Also we have valid credentials for submitting into lcgce0.shef.ac.uk/jobmanager-lcgpbs
:
$ gw_em_mad_prews < INIT - - - > INIT - SUCCESS - < SUBMIT 1 gilda-ce.rediris.es/jobmanager-lcgpbs /tmp/job.rsl > SUBMIT 1 SUCCESS https://gilda-ce.rediris.es:20008/22945/1266605386/ (some time after) < POLL 1 - - > POLL 1 SUCCESS PENDING < POLL 1 - - > POLL 1 SUCCESS ACTIVE (50 seconds later) > TIMER - SUCCESS Credential is valid until Fri Oct 16 00:15:36 2009 < FINALIZE - - - > FINALIZE - SUCCESS -
In order to provide an abstraction with the file transfer management middleware layer, GridWay uses a Middleware Access Driver (MAD) module to transfer job files. This module provides basic operations with the file transfer middleware.
The format to send a request to the Transfer MAD, through its standard input, is:
OPERATION JID TID EXE_MODE SRC_URL DST_URL
Where:
On the other side, the format to receive a response from the MAD, through its standard output, is:
OPERATION JID TID RESULT INFO
Where: