PYME.cluster.rules module

Refactored rule pushing. Introduces rule classes which act as a python proxy for the JSON rule objects, and rule factories for use when constructing equivalent rules (or chains of rules) for multiple series.

Design principles as follows:

  • a Rule object is a 1:1 mapping with rules on the ruleserver

  • you create a new Rule object for each rule you push to the server

  • a pattern for rule creation is expressed using a RuleFactory

  • calling `.get_rule() on the first step returns you a fully linked rule suitable for submitting

  • very limited inference of inputs etc … between steps - rely on specifying inputs and outputs using patterns instead

Examples

>>> step1 = LocalisationRuleFactory(analysisMetadata=mdh)
>>> step2 = RecipeRuleFactory(recipeURI='PYME-CLUSTER///RECIPES/render_image.yaml', input_patterns={'input':'{{spool_dir}}/analysis/{{series_stub}}.h5r'})
>>> step1.chain(step2)
>>> step3 = RecipeRuleFactory(recipeURI='PYME-CLUSTER///RECIPES/measure_blobs.yaml', input_patterns={'input':'{{spool_dir}}/analysis/{{series_stub}}.tif'})
>>> step2.chain(step3)

or

>>> step1 = LocalisationRuleFactory(analysisMetadata=mdh,
>>>                                 on_completion=RecipeRuleFactory(recipeURI='PYME-CLUSTER///RECIPES/render_image.yaml',
>>>                                                                 input_patterns={'input':'{{spool_dir}}/analysis/{{series_stub}}.h5r'},
>>>                                                                 on_completion=RecipeRuleFactory(recipeURI='PYME-CLUSTER///RECIPES/measure_blobs.yaml',
>>>                                                                                                 input_patterns={'input':'{{spool_dir}}/analysis/{{series_stub}}.tif'})))

then:

>>> def on_launch_analysis(context={'spool_dir': ..., 'series_stub': ...}):
>>>    step1.get_rule(context=context).push()
class PYME.cluster.rules.LocalisationRule(seriesName, analysisMetadata, resultsFilename=None, startAt=0, dataSourceModule=None, serverfilter='runnervm0kj6c', **kwargs)

Bases: Rule

Create a new rule. Sub-classed for specific rule types

Parameters
on_completionRuleFactory instance

A rule to run after this one has completed (also setable using the .chain() method)

kwargsany additional arguments, available in the context used when creating the rule template
property complete

Is this rule complete, or do we need to poll for more input?

Over-ridden in localisation rule Returns ——-

property data_complete

Is the underlying data complete?

get_new_tasks()

Over-ridden in rules where all the data is not gauranteed to be present when the rule is created

Returns
release_start, release_endthe indices of starting and ending tasks to release
on_data_complete()

Over-ride in derived rules so that, e.g. events can be written at the end of a real-time acquisition

prepare()

Do any setup work - e.g. uploading metadata required before the rule is triggered

Returns
post_argsdict

a dictionary with arguments to pass to RulePusher._post_rule() - specifically timeout, max_tasks, release_start, release_end

property total_frames
class PYME.cluster.rules.LocalisationRuleFactory(**kwargs)

Bases: RuleFactory

See LocalisationRule for full initialization arguments. Required kwargs are:

:seriesName : str :analysisMetadata : PYME.IO.MetaDataHandler.MDHandlerBase

exception PYME.cluster.rules.NoNewTasks

Bases: Exception

class PYME.cluster.rules.RecipeRule(recipe=None, recipeURI=None, output_dir=None, **kwargs)

Bases: Rule

Create a recipe rule

Parameters
recipestr or recipes.modules.ModuleCollection

The recipe as YAML text or as a recipe instance (alternatively provide recipeURL)

recipeURIstr

A cluster URI for the recipe text (if recipe not provided directly)

output_dirstr

The directory to put the recipe output TODO: should this be templated based on context?

kwargsdict

Parameters for the base Rule class (notably on_completion, which, if provided, should be a RuleFactory instance). Additional parameters, not consumed by Rule are accessible in the context used for creating rule templates.

One (and only one) of the following keyword parameters should be provided to specify the recipe inputs:
inputsdict

keys are recipe namespace keys, values are either lists of file URIs, or globs which will be expanded to a list of URIs. Corresponds to the inputsByTask property in the recipe description (see ruleserver docs). inputsByTask will be used by the server to populate the inputs for individual tasks

input_templatesdict

simplar to inputs, except that dictionary substitution with the rule context is perfromed on the values before they are written to inputsByTasks

rule_outputsdict

used with chained recipes, this is the outputs of the previous recipe step. Unlike inputs and input_templates this is a dict of str, rather than of list (or glob-implied list) and is written directly to the “inputs” section of the template, rather than to the inputsByTask property of the rule, short-circuiting server side input filling.

TODO - support for templated recipes? Subclass?
prepare()

Do any setup work - e.g. uploading metadata required before the rule is triggered

Returns
post_argsdict

a dictionary with arguments to pass to RulePusher._post_rule() - specifically timeout, max_tasks, release_start, release_end

class PYME.cluster.rules.RecipeRuleFactory(**kwargs)

Bases: RuleFactory

Create a new rule factory. Sub-classed for specific rule types

Parameters
on_completionRuleFactory instance

A rule to run after this one has completed (also setable using the .chain() method)

kwargsany additional arguments (ignored in base class)
class PYME.cluster.rules.Rule(on_completion=None, **kwargs)

Bases: object

Create a new rule. Sub-classed for specific rule types

Parameters
on_completionRuleFactory instance

A rule to run after this one has completed (also setable using the .chain() method)

kwargsany additional arguments, available in the context used when creating the rule template
cleanup()
property complete

Is this rule complete, or do we need to poll for more input?

Over-ridden in localisation rule Returns ——-

property data_complete

Is the underlying data complete?

get_new_tasks()

Over-ridden in rules where all the data is not guaranteed to be present when the rule is created

Returns
release_start, release_endthe indices of starting and ending tasks to release
join()
on_data_complete()

Over-ride in derived rules so that, e.g. events can be written at the end of a real-time acquisition

property output_files
prepare()

Do any setup work - e.g. uploading metadata required before the rule is triggered

Returns
post_argsdict

a dictionary with arguments to pass to RulePusher._post_rule() - specifically timeout, max_tasks, release_start, release_end

push()
status()
watch()
class PYME.cluster.rules.RuleFactory(on_completion=None, rule_class=<class 'PYME.cluster.rules.Rule'>, **kwargs)

Bases: object

Create a new rule factory. Sub-classed for specific rule types

Parameters
on_completionRuleFactory instance

A rule to run after this one has completed (also setable using the .chain() method)

kwargsany additional arguments (ignored in base class)
chain(on_completion)

Chain a rule (as an alternative to passing on_completion to the rule constructor, allows us to create rule chains from front to back rather than back to front

Parameters
on_completionRuleFactory instance

The rule to run after this one has completed

get_rule(context)

Populate a rule using series specific info from context. Note that the the rule class initialization arguments should be passed in the RuleFactory initialization as kwargs, but can also be passed here in context if, e.g. the series name is not known when creating the

Parameters
contextdict

a dictionary containing series specific info to populate into the task template

Returns
Rule

a rule suitable for submitting to the ruleserver /add_integer_id_rule endpoint

property rule_type
class PYME.cluster.rules.RuleGroupWatcher

Bases: object

Single object / thread to watch multiple rules

Used when launching rules from jupyter notebooks, this will show an updating display of rule progress

register()
watch(rule)
class PYME.cluster.rules.RuleWatcher

Bases: object

Single object / thread to watch multiple rules

watch(rule)
class PYME.cluster.rules.SpoolLocalLocalizationRule(spooler, seriesName, analysisMetadata, resultsFilename=None, startAt=0, serverfilter='runnervm0kj6c', **kwargs)

Bases: LocalisationRule

Create a new rule. Sub-classed for specific rule types

Parameters
on_completionRuleFactory instance

A rule to run after this one has completed (also setable using the .chain() method)

kwargsany additional arguments, available in the context used when creating the rule template
property data_complete

Is the underlying data complete?

on_data_complete()

Over-ride in derived rules so that, e.g. events can be written at the end of a real-time acquisition

property total_frames
class PYME.cluster.rules.SpoolLocalLocalizationRuleFactory(**kwargs)

Bases: RuleFactory

See SpoolLocalLocalizationRule for full initialization arguments. Required kwargs are:

:seriesName : str :analysisMetadata : PYME.IO.MetaDataHandler.MDHandlerBase

PYME.cluster.rules.verify_cluster_results_filename(resultsFilename)

Checks whether a results file already exists on the cluster, and returns an available version of the results filename. Should be called before writing a new results file.

Parameters
resultsFilenamestr

cluster path, e.g. pyme-cluster:///example_folder/name.h5r

Returns
——-
resultsFilenamestr

cluster path which may have _# appended to it if the input resultsFileName is already in use, e.g. pyme-cluster:///example_folder/name_1.h5r