Evaluation¶
Tools to get train/test splits and compute evaluation metrics.
-
phyre.metrics.
MAIN_EVAL_SETUPS
= ('ball_cross_template', 'ball_within_template', 'two_balls_cross_template', 'two_balls_within_template')¶ List of valid evaluation setups for phyre.
-
phyre.metrics.
MAX_TEST_ATTEMPTS
= 100¶ Maximum number of attempts agents are allowed to make per specific task.
-
phyre.
get_fold
(eval_setup: str, seed: int) → Tuple[Tuple[str, ...], Tuple[str, ...], Tuple[str, ...]] Get seed’th fold for specified evaluation setup.
- Parameters
eval_setup – The name of the evaluation setup to use. E.g., ball_cross_template.
seed – The random seed to create the fold.
- Returns
- Tuple (train_ids, dev_ids, test_ids)
Contains task ids to use for each split.
- Raises
ValueError – Eval_setup is not valid evaluation setup.
-
phyre.
list_eval_setups
() → Tuple[str, ...] Get a list of names for all known eval setups.
-
phyre.
eval_setup_to_action_tier
(eval_setup_name: str) → str Gets a default action tier for an eval setup.
-
class
phyre.
Evaluator
(task_ids: Tuple[str]) Class for storing simulation results and calculating metrics.
-
compute_all_metrics
() → Dict[str, Any] Computes metrics based on recorded log of simulation results.
- Returns
Dictionary mapping metric name to computed value.
-
get_attempts_for_task
(task_index) - Parameters
task_index – index into task_ids of task.
- Returns
Number recorded attempts on task_index.
-
get_auccess
(attempts: int = 100) → float Calculated AUCCESS metric.
Starting in v0.0.1.1 renamed from get_aucess to get_auccess.
- Parameters
attempts – Number of attempts to use for calulation of auccess, default MAX_TEST_ATTEMPTS.
- Returns
Result of AUCCESS calculation.
-
maybe_log_attempt
(task_index: int, status: Union[int, str]) → bool Logs status of attempt on task iff status is for a valid action.
- Parameters
task_index – index into task_ids of task.
status – simulation status of attempt on task.
- Returns
- True if attempt was logged (valid action, less than
MAX_TEST_ATTEMPTS made on task), else False.
- Raises
AssertionError – More than MAX_TEST_ATTEMPTS attempts were made on the task.
-
property
task_ids
Returns ordered list of tasks ids.
-