rcognita.critics.CriticTabularPI
- class rcognita.critics.CriticTabularPI(*args, tolerance=0.001, N_update_iters_max=50, **kwargs)
- __init__(*args, tolerance=0.001, N_update_iters_max=50, **kwargs)
Initialize a new instance of the CriticTabularPI class.
- Parameters
args (tuple) – Positional arguments to pass to the superclass’s __init__ method.
tolerance (float) – The tolerance value for the update loop.
N_update_iters_max (int) – The maximum number of iterations for the update loop.
kwargs (dict) – Keyword arguments to pass to the superclass’s __init__ method.
Methods
__init__
(*args[, tolerance, N_update_iters_max])Initialize a new instance of the CriticTabularPI class.
accept_or_reject_weights
(weights[, …])Determine whether to accept or reject the given weights based on whether they violate the given constraints.
cache_weights
([weights])Stores a copy of the current model weights.
initialize_buffers
()Initialize the action and observation buffers with zeros.
objective
(observation, action)Calculate the value of a state given the action taken and the observation of the current state.
optimize_weights
([time])Compute optimized critic weights, possibly subject to constraints.
reset
()Reset the outcome and current critic loss variables, and re-initialize the buffers.
restore_weights
()Restores the model weights to the cached weights.
update
()Update the value table.
update_and_cache_weights
([weights])Update the model’s weights and cache the new values.
update_buffers
(observation, action)Updates the buffers of the critic with the given observation and action.
update_outcome
(observation, action)Update the outcome variable based on the running objective and the current observation and action.
update_single_cell
(observation)Update the value function for a single state.
update_target
(new_target)update_weights
([weights])Update the weights of the critic model.
Attributes
optimizer_engine
Returns the engine used by the optimizer.