rcognita.critics.CriticOfObservation

class rcognita.critics.CriticOfObservation(system_dim_input: int, system_dim_output: int, data_buffer_size: int, optimizer: Optional[rcognita.optimizers.Optimizer] = None, model: Optional[rcognita.models.Model] = None, running_objective: Optional[rcognita.objectives.Objective] = None, discount_factor: float = 1.0, observation_target: Optional[numpy.ndarray] = None, sampling_time: float = 0.01, critic_regularization_param: float = 0.0)

This is the class of critics that are represented as functions of observation only.

__init__(system_dim_input: int, system_dim_output: int, data_buffer_size: int, optimizer: Optional[rcognita.optimizers.Optimizer] = None, model: Optional[rcognita.models.Model] = None, running_objective: Optional[rcognita.objectives.Objective] = None, discount_factor: float = 1.0, observation_target: Optional[numpy.ndarray] = None, sampling_time: float = 0.01, critic_regularization_param: float = 0.0)

Initialize a critic object.

Parameters
  • system_dim_input (int) – Dimension of the input data

  • system_dim_output (int) – Dimension of the output data

  • data_buffer_size (int) – Maximum size of the data buffer

  • optimizer (Optional[Optimizer]) – Optimizer to use for training the critic

  • model (Optional[Model]) – Model to use for the critic

  • running_objective (Optional[Objective]) – Objective function to use for the critic

  • discount_factor (float) – Discount factor to use in the value calculation

  • observation_target (Optional[np.ndarray]) – Target observation for the critic

  • sampling_time (float) – Sampling time for the critic

  • critic_regularization_param (float) – Regularization parameter for the critic

Methods

__init__(system_dim_input, …[, optimizer, …])

Initialize a critic object.

accept_or_reject_weights(weights[, …])

Determine whether to accept or reject the given weights based on whether they violate the given constraints.

cache_weights([weights])

Stores a copy of the current model weights.

initialize_buffers()

Initialize the action and observation buffers with zeros.

objective([data_buffer, weights])

Objective of the critic, say, a squared temporal difference.

optimize_weights([time])

Compute optimized critic weights, possibly subject to constraints.

reset()

Reset the outcome and current critic loss variables, and re-initialize the buffers.

restore_weights()

Restores the model weights to the cached weights.

update_and_cache_weights([weights])

Update the model’s weights and cache the new values.

update_buffers(observation, action)

Updates the buffers of the critic with the given observation and action.

update_outcome(observation, action)

Update the outcome variable based on the running objective and the current observation and action.

update_target(new_target)

update_weights([weights])

Update the weights of the critic model.

Attributes

optimizer_engine

Returns the engine used by the optimizer.