rcognita.critics.CriticTabularPI

class rcognita.critics.CriticTabularPI(*args, tolerance=0.001, N_update_iters_max=50, **kwargs)
__init__(*args, tolerance=0.001, N_update_iters_max=50, **kwargs)

Initialize a new instance of the CriticTabularPI class.

Parameters
  • args (tuple) – Positional arguments to pass to the superclass’s __init__ method.

  • tolerance (float) – The tolerance value for the update loop.

  • N_update_iters_max (int) – The maximum number of iterations for the update loop.

  • kwargs (dict) – Keyword arguments to pass to the superclass’s __init__ method.

Methods

__init__(*args[, tolerance, N_update_iters_max])

Initialize a new instance of the CriticTabularPI class.

accept_or_reject_weights(weights[, …])

Determine whether to accept or reject the given weights based on whether they violate the given constraints.

cache_weights([weights])

Stores a copy of the current model weights.

initialize_buffers()

Initialize the action and observation buffers with zeros.

objective(observation, action)

Calculate the value of a state given the action taken and the observation of the current state.

optimize_weights([time])

Compute optimized critic weights, possibly subject to constraints.

reset()

Reset the outcome and current critic loss variables, and re-initialize the buffers.

restore_weights()

Restores the model weights to the cached weights.

update()

Update the value table.

update_and_cache_weights([weights])

Update the model’s weights and cache the new values.

update_buffers(observation, action)

Updates the buffers of the critic with the given observation and action.

update_outcome(observation, action)

Update the outcome variable based on the running objective and the current observation and action.

update_single_cell(observation)

Update the value function for a single state.

update_target(new_target)

update_weights([weights])

Update the weights of the critic model.

Attributes

optimizer_engine

Returns the engine used by the optimizer.