PYME.cluster.ruleserver module

This module defines a RESTful HTTP server, the PYME Rule Server, which manages task distribution across a cluster through the use of rules. Rules are essentially a json template which defines how tasks may be generated (by substitution into the template on the client side). Generation of individual tasks on the client has the dual benefits of a) reducing the CPU load and memory requirements on the server and b) dramatically decreasing the network bandwidth used for task distribution.

The exposed REST API is as follows:

Endpoint

Verb

Brief Description

/add_integer_id_rule

POST

Add a new rule

/release_rule_tasks

POST

Release tasks IDs associated with a rule

/task_advertisments

GET

Retrieve a list of advertisements

/bid_on_tasks

POST

Submit bids on advertised tasks

/handin

POST

Advise rule server of task completion

/distributor/queues

GET

Get status information

Implementation details follow. The json format and parameters for the REST calls are defined in the docstrings of the python functions defining that method (linked from the table above). The server is launched by the PYMERuleServer command (see PYME.cluster.PYMERuleServer) which loads this in a separate thread and takes care of logging and zeroconf registration, rather than by directly running this file.

class PYME.cluster.ruleserver.Bid(bidder_id, cost)

Bases: tuple

Create new instance of Bid(bidder_id, cost)

property bidder_id

Alias for field number 0

property cost

Alias for field number 1

class PYME.cluster.ruleserver.IntegerIDRule(ruleID, task_template, inputs_by_task=None, max_task_ID=100000, task_timeout=600, rule_timeout=3600, on_completion=None)

Bases: Rule

A rule which generates tasks based on a template.

Parameters
ruleIDstr

A unique ID for this rule

task_templatestr

A json template for generating tasks (see templates section below)

inputs_by_tasklist of dict

A list of dictionaries mapping recipe variable names to URIs to treat as inputs (recipes only)

max_task_IDint

The maximum number of tasks that can be generated by this rule. If the final number of tasks is not known at creation, set this to a value which is garuanteed to be higher and then call mark_release_complete() after all tasks have been released / made available.

task_timeout: float

A timeout in seconds for each task. If a task takes longer than task_timeout seconds it is assumed that the node executing it has fallen over and we retry on a different node (up to a maximum number of times set by the ‘ruleserver-retries` config option).

rule_timeoutfloat

A timeout in seconds from the last task we processed after which we assume that no more tasks are coming and we can delete the rule (to keep memory down in a long-term usage scenario).

Notes

Templates

Templates are a string which can be substituted to generate tasks. There are currently two supported formats for templates, those for localization tasks, and those for recipes. Each take the form of a json dictionary:

Localization

{
 "id" : "{{ruleID}}~{{taskID}}",
 "type" : "localization",
 "taskdef" : {"frameIndex" : {{taskID}}, "metadata" : "PYME-CLUSTER://path/to/series/metadata.json"},
 "inputs" : {"frames" : "PYME-CLUSTER://path/to/series.pcs"},
 "outputs" : {"fitResults" : "PYME-CLUSTER://__aggregate_h5r/path/to/analysis.h5r/FitResults",
              "driftResults" : "PYME-CLUSTER://__aggregate_h5r/path/to/analysis.h5r/DriftResults"}
}

Recipes The recipe can either be specified inline:

{
 "id" : "{{ruleID}}~{{taskID}}",
 "type" : "recipe",
 "taskdef" : {"recipe" : "<recipe as a YAML string>"},
 "inputs" : {{taskInputs}},
 "output_dir" : "PYME-CLUSTER://path/to/output/dir",
}

or using a cluster URI:

{
 "id" : "{{ruleID}}~{{taskID}}",
 "type" : "recipe",
 "taskdefRef" : "PYME-CLUSTER://path/to/recipe.yaml",
 "inputs" : {{taskInputs}},
 "output_dir" : "PYME-CLUSTER://path/to/output/dir",
}

The rule will substitute {{taskInputs}} with a dictionary mapping integer task IDs to recipe input files, e.g.

{0 : {"recipe_input_0" : "input_0_URI_0","recipe_input_1" : "input_1_URI_0"},
 1 : {"recipe_input_0" : "input_0_URI_1","recipe_input_1" : "input_1_URI_1"},
 2 : {"recipe_input_0" : "input_0_URI_2","recipe_input_1" : "input_1_URI_2"},
 }

Alternatively the inputs dictionary can be supplied directly (without relying on the taskInputs substitution).

Rule Chaining

Rules may also define a chained rule, to be run on completion of the original rule. This is accomplished by supplying an “on_completion” dictionary. This is a dictionary, {'template' : <template>, 'max_tasks' : max_tasks, 'rule_timeout' : timeout, 'on_completion' : {...}}, with the template following the format above and everything but the template being optional (max_tasks defaults to 1). Follow on / chained rules are slightly more restricted than standard rules in that recipe “inputs” must be hardcoded (no {{taskInputs}} substitution).

Chained rules are created once the original rule is finished (see IntegerIDRule.finished). All tasks for the chained rule will be released immediately.

COST_THRESHOLD = 0.2
TASK_INFO_DTYPE = dtype([('status', 'u1'), ('nRetries', 'u1'), ('expiry', '<f4'), ('cost', '<f4')])
property advert

The task advertisment.

If the rule has tasks available, a dictionary of the form: {"ruleID" : str, "taskTemplate" : str, "availableTaskIDs" : [list of int], "inputsByTask" : [optional] dict mapping task IDs to inputs}

“inputsByTask” is only provided for some recipe tasks.

bid(bid)

Bid on tasks (and return any that match). Note the current implementation is very naive and doesn’t check bid cost - i.e. the first bid gets the task. This works if (and only if) the clients are well behaved and preferentially bid on tasks which have a lowest cost for them.

Parameters
biddict

A dictionary containing the ruleID, the IDs of the tasks to bid on, and their costs {"ruleID" : str ,"taskIDs" : [list of int],"taskCosts" : [list of float]}

Returns
successful_bids: dict

A dictionary containing the ruleID, the IDs of the tasks awarded, and the rule template {"ruleID" : ruleID, "taskIDs": [list of int], "template" : "<rule template>"}

property expired

Whether the rule has expired (no available tasks, no tasks assigned, and time > expiry) and can be removed

property finished

Whether the rule has finished (the maximum number of tasks which can be created have been completed).

inactivate()

Mark rule as inactive (generates no adverts) to facilitate aborting / pausing long-running rules.

info()

Get information / status about this rule

Returns
statusdict

A status dictionary of the form: {'tasksPosted': int, 'tasksRunning': int, 'tasksCompleted': int, 'tasksFailed' : int, 'averageExecutionCost' : float}

make_range_available(start, end)

Make a range of tasks available (to be called once the underlying data is available) [start, end) Parameters ———- start : int

first task number to release (inclusive)

endint

last task number to release (exclusive)

Raises

RuntimeError

if asked to release a range which is invalid for the max tasks we can create from this rule

mark_complete(info)

Mark a set of tasks as completed and/or failed

Parameters
infodict

A dictionary of the form: {"ruleID": str, "taskIDs" : [list of int], "status" : [list of int]}

There should be an entry in status for each entry in taskIDs, the valid values of status being STATUS_COMPLETE=3 or STATUS_FAILED=4.

Returns
mark_release_complete(n_tasks=None)

Signal that all tasks which are going to be released for this rule have been, and that the rule should evaluate as finished once these tasks have been evaluated. Used when the number of tasks is not known in advance. (See also __init__ docs)

FIXME - What happens when task release doesn’t start at 0?

Parameters
n_tasksint, optional

mark complete at a given number of tasks. The default is to mark complete at the highest task number which has been released prior to making this call. The use case for this optional parameter is to avoid the need for locking when this gets called from another thread than the make_range_available() and implies a promise by the calling code that it will not release any more tasks above the given number.

Raises
ValueError

If n_tasks is supplied and is either smaller than the largest task already released or larger than larger than max_task_ID

poll_timeouts()
class PYME.cluster.ruleserver.Rule

Bases: object

class PYME.cluster.ruleserver.RuleServer

Bases: object

MAX_ADVERTISEMENTS = 30000
add_integer_id_rule(max_tasks=1000000.0, release_start=None, release_end=None, ruleID=None, timeout=3600, body='')

HTTP endpoint (POST) for adding a new rule.

Add a rule that generates tasks based on an integer ID, such as a frame number (localization) or an index into a list of inputs (recipes). By default, tasks are not released on creation, but later with release_rule_tasks(). This allows for the creation of a rule before a series has finished spooling.

Parameters
max_tasksint

The maximum number of tasks that could match this rule. Generally the number of frames in a localization series (if known in advance) or the number of inputs on which to run a recipe. Allows us to pre-allocate a task array of the correct size (passing this rather than relying on the default reduces memory usage).

release_start, release_endint

Release a range of tasks for computation after creating the rule (avoids a call to release_rule_tasks()

ruleIDstr

A unique identifier for the rule. If not provided, one will be automatically generated

timeoutint

How long this rule should live for, in seconds. Defaults to an hour.

bodystr

A json dictionary {'template' : "<rule template>", 'inputsByTask' : [list of URIs]}. See IntegerIDRule for the format of the rule template. The inputsByTask parameter is only used for recipes, and can be omitted for localisation analysis, or for recipe tasks using a hard coded "inputs" dictionary. An optional additional parameter, “on_completion” parameter may be given, itself consisting of a new {'template': <template>, 'on_completion': {...}} dictionary.

Returns
resultjson str

The result of adding the rule. A dict of the form: {"ok" : "True", 'ruleID' : str}

bid_on_tasks(body='')

HTTP endpoint (POST) for bidding on tasks.

Parameters
bodyjson list of bids

A list of bids, each of which is a dictionary of the form {"ruleID" : str ,"taskIDs" : [list of int],"taskCosts" : [list of float]} see IntegerIDRule.Bid() for details.

Returns
assignments: json str

A json formatted list of task assignments, each of which has the form: {"ruleID" : ruleID, "taskIDs": [list of int], "template" : "<rule template>"} See IntegerIDTask.bid()

get_queue_info()

a throttled version of queue info

Returns
get_queues()

HTTP Endpoint (GET) - visible at “distributor/queues” - for querying ruleserver status.

Returns
status: json str

A dictionary of the form {"ok" : True, "result" : {ruleID0 : rule0.info(), ruleID1 : rule1.info()}} See IntegerIDRule.info().

handin(body)

HTTP Endpoint (POST) to mark tasks as having been completed

Parameters
bodyjson list

A list of entries of the form {"ruleID": str, "taskIDs" : [list of int], "status" : [list of int]} where status has the same length as taskIDs with each entry being either STATUS_COMPLETE or STATUS_FAILED. See IntegerIDRule.mark_complete().

Returns
successjson str

{"ok" : "True"} if successful.

inactivate_rule(ruleID)
mark_release_complete(ruleID, n_tasks=None)

HTTP Endpoint (POST) to signal that no more tasks will be released for a rule and the rule can be regarded as finished once the previously released tasks are complete.

Parameters
rule_idstr

ID of the rule to update

n_tasksint, [optional, discouraged]

a fixed number of tasks to truncate the rule at. Used for avoiding locks/race conditions in a multi-threaded client process. NOTE - this parameter may disappear in a future version to enable rules with incomplete ranges to be marked as complete (see comments in IntegerIDRule.finished). Locking or otherwise structuring the client such that no release_rule_tasks() calls can occur after a call to mark_release_complete() is therefore preferred.

Returns
successstr

{"ok" : "True"} if successful.

Raises
ValueError

If value is less than number of tasks already assigned.

FIXME - HTTP endpoints should not raise - they should handle the error and return an error response to the client.
release_rule_tasks(ruleID, release_start, release_end, body='')

HTTP Endpoint (POST or GET ) for releasing tasks associated with a rule.

When performing spooling data analysis we typically have one rule per series and one task per frame. This method allows for the tasks associated with a particular frame (or range of frames) to be released as they become available.

Parameters
ruleIDstr

The rule ID

release_start, release_end: int

The range of tasks IDs to release

bodystr, empty

Needed for interface compatibility, ignored

Returns
successjson str

{"ok" : "True"} if successful.

status()
stop()
task_advertisements()

HTTP endpoint (GET) for retrieving task advertisements.

Note - by default, a limited number of advertisements are posted at any given time (to limit bandwidth). This should be enough to keep all the workers busy, with new adverts being posted once tasks are assigned to workers.

Returns
advertsjson str

List of advertisements. See IntegerIDRule.advert

class PYME.cluster.ruleserver.ServerThread(port, bind_addr='', profile=False)

Bases: Thread

This constructor should always be called with keyword arguments. Arguments are:

group should be None; reserved for future extension when a ThreadGroup class is implemented.

target is the callable object to be invoked by the run() method. Defaults to None, meaning nothing is called.

name is the thread name. By default, a unique name is constructed of the form “Thread-N” where N is a small decimal number.

args is the argument tuple for the target invocation. Defaults to ().

kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.

If a subclass overrides the constructor, it must make sure to invoke the base class constructor (Thread.__init__()) before doing anything else to the thread.

run()
shutdown()
class PYME.cluster.ruleserver.WFRuleServer(port, bind_addr='')

Bases: APIHTTPServer, RuleServer

Combines the RuleServer with it’s web framework.

Largely an artifact of initial experiments using cherrypy (allowed quickly switching between cherrypy and our internal webframework).

PYME.cluster.ruleserver.on_SIGHUP(signum, frame)