Evaluation

Tools to get train/test splits and compute evaluation metrics.

phyre.metrics.MAIN_EVAL_SETUPS = ('ball_cross_template', 'ball_within_template', 'two_balls_cross_template', 'two_balls_within_template')

List of valid evaluation setups for phyre.

phyre.metrics.MAX_TEST_ATTEMPTS = 100

Maximum number of attempts agents are allowed to make per specific task.

phyre.get_fold(eval_setup: str, seed: int) → Tuple[Tuple[str, ...], Tuple[str, ...], Tuple[str, ...]]

Get seed’th fold for specified evaluation setup.

Parameters
  • eval_setup – The name of the evaluation setup to use. E.g., ball_cross_template.

  • seed – The random seed to create the fold.

Returns

Tuple (train_ids, dev_ids, test_ids)

Contains task ids to use for each split.

Raises

ValueError – Eval_setup is not valid evaluation setup.

phyre.list_eval_setups() → Tuple[str, ...]

Get a list of names for all known eval setups.

phyre.eval_setup_to_action_tier(eval_setup_name: str) → str

Gets a default action tier for an eval setup.

class phyre.Evaluator(task_ids: Tuple[str])

Class for storing simulation results and calculating metrics.

compute_all_metrics() → Dict[str, Any]

Computes metrics based on recorded log of simulation results.

Returns

Dictionary mapping metric name to computed value.

get_attempts_for_task(task_index)
Parameters

task_index – index into task_ids of task.

Returns

Number recorded attempts on task_index.

get_auccess(attempts: int = 100) → float

Calculated AUCCESS metric.

Starting in v0.0.1.1 renamed from get_aucess to get_auccess.

Parameters

attempts – Number of attempts to use for calulation of auccess, default MAX_TEST_ATTEMPTS.

Returns

Result of AUCCESS calculation.

maybe_log_attempt(task_index: int, status: Union[int, str]) → bool

Logs status of attempt on task iff status is for a valid action.

Parameters
  • task_index – index into task_ids of task.

  • status – simulation status of attempt on task.

Returns

True if attempt was logged (valid action, less than

MAX_TEST_ATTEMPTS made on task), else False.

Raises

AssertionError – More than MAX_TEST_ATTEMPTS attempts were made on the task.

property task_ids

Returns ordered list of tasks ids.