
class rcognita.critics.Critic(system_dim_input: int, system_dim_output: int, data_buffer_size: int, optimizer: Optional[rcognita.optimizers.Optimizer] = None, model: Optional[rcognita.models.Model] = None, running_objective: Optional[rcognita.objectives.Objective] = None, discount_factor: float = 1.0, observation_target: Optional[numpy.ndarray] = None, sampling_time: float = 0.01, critic_regularization_param: float = 0.0)

Critic base class.

A critic is an object that estimates or provides the value of a given action or state in a reinforcement learning problem.

The critic estimates the value of an action by learning from past experience, typically through the optimization of a loss function.

__init__(system_dim_input: int, system_dim_output: int, data_buffer_size: int, optimizer: Optional[rcognita.optimizers.Optimizer] = None, model: Optional[rcognita.models.Model] = None, running_objective: Optional[rcognita.objectives.Objective] = None, discount_factor: float = 1.0, observation_target: Optional[numpy.ndarray] = None, sampling_time: float = 0.01, critic_regularization_param: float = 0.0)

Initialize a critic object.

  • system_dim_input (int) – Dimension of the input data

  • system_dim_output (int) – Dimension of the output data

  • data_buffer_size (int) – Maximum size of the data buffer

  • optimizer (Optional[Optimizer]) – Optimizer to use for training the critic

  • model (Optional[Model]) – Model to use for the critic

  • running_objective (Optional[Objective]) – Objective function to use for the critic

  • discount_factor (float) – Discount factor to use in the value calculation

  • observation_target (Optional[np.ndarray]) – Target observation for the critic

  • sampling_time (float) – Sampling time for the critic

  • critic_regularization_param (float) – Regularization parameter for the critic


__init__(system_dim_input, …[, optimizer, …])

Initialize a critic object.

accept_or_reject_weights(weights[, …])

Determine whether to accept or reject the given weights based on whether they violate the given constraints.


Stores a copy of the current model weights.


Initialize the action and observation buffers with zeros.



Compute optimized critic weights, possibly subject to constraints.


Reset the outcome and current critic loss variables, and re-initialize the buffers.


Restores the model weights to the cached weights.


Update the model’s weights and cache the new values.

update_buffers(observation, action)

Updates the buffers of the critic with the given observation and action.

update_outcome(observation, action)

Update the outcome variable based on the running objective and the current observation and action.



Update the weights of the critic model.



Returns the engine used by the optimizer.