Evaluation¶

Tools to get train/test splits and compute evaluation metrics.

phyre.metrics.MAIN_EVAL_SETUPS = ('ball_cross_template', 'ball_within_template', 'two_balls_cross_template', 'two_balls_within_template')¶: List of valid evaluation setups for phyre.

phyre.metrics.MAX_TEST_ATTEMPTS = 100¶: Maximum number of attempts agents are allowed to make per specific task.

phyre.get_fold(eval_setup: str, seed: int) → Tuple[Tuple[str, ...], Tuple[str, ...], Tuple[str, ...]]

Get seed’th fold for specified evaluation setup.

Parameters

eval_setup – The name of the evaluation setup to use. E.g., ball_cross_template.
seed – The random seed to create the fold.

Returns

Tuple (train_ids, dev_ids, test_ids): Contains task ids to use for each split.

Raises

ValueError – Eval_setup is not valid evaluation setup.

phyre.list_eval_setups() → Tuple[str, ...]: Get a list of names for all known eval setups.

phyre.eval_setup_to_action_tier(eval_setup_name: str) → str: Gets a default action tier for an eval setup.

class phyre.Evaluator(task_ids: Tuple[str])

Class for storing simulation results and calculating metrics.

compute_all_metrics() → Dict[str, Any]

Computes metrics based on recorded log of simulation results.

get_attempts_for_task(task_index)

get_auccess(attempts: int = 100) → float

Calculated AUCCESS metric.

Starting in v0.0.1.1 renamed from get_aucess to get_auccess.

Parameters: attempts – Number of attempts to use for calulation of auccess, default MAX_TEST_ATTEMPTS.
Returns: Result of AUCCESS calculation.

maybe_log_attempt(task_index: int, status: Union[int, str]) → bool

Logs status of attempt on task iff status is for a valid action.

Parameters

Returns

True if attempt was logged (valid action, less than: MAX_TEST_ATTEMPTS made on task), else False.

Raises

AssertionError – More than MAX_TEST_ATTEMPTS attempts were made on the task.