rcognita.critics.CriticCALF

class rcognita.critics.CriticCALF(*args, safe_decay_rate=0.001, is_dynamic_decay_rate=True, predictor=None, observation_init=None, safe_controller=None, penalty_param=0, is_predictive=True, **kwargs)
__init__(*args, safe_decay_rate=0.001, is_dynamic_decay_rate=True, predictor=None, observation_init=None, safe_controller=None, penalty_param=0, is_predictive=True, **kwargs)

Initialize a CriticCALF object.

Parameters
  • args – Arguments to be passed to the base class CriticOfObservation.

  • safe_decay_rate – Rate at which the safe set shrinks over time.

  • is_dynamic_decay_rate – Whether the decay rate should be dynamic or not.

  • predictor – A predictor object to be used to predict future observations.

  • observation_init – Initial observation to be used to initialize the safe set.

  • safe_controller – Safe controller object to be used to compute stabilizing actions.

  • penalty_param – Penalty parameter to be used in the CALF objective.

  • is_predictive – Whether the safe constraints should be computed based on predictions or not.

  • kwargs – Keyword arguments to be passed to the base class CriticOfObservation.

Methods

CALF_critic_lower_bound_constraint(weights)

Constraint that ensures that the value of the critic is above a certain lower bound.

CALF_critic_upper_bound_constraint(weights)

Calculate the constraint violation for the CALF decay constraint when no prediction is made.

CALF_decay_constraint_no_prediction(weights)

Constraint that ensures that the CALF value is decreasing by a certain rate.

CALF_decay_constraint_predicted_on_policy(weights)

Constraint for ensuring that the CALF function decreases at each iteration.

CALF_decay_constraint_predicted_safe_policy(weights)

Calculate the constraint violation for the CALF decay constraint when a predicted safe policy is used.

__init__(*args[, safe_decay_rate, …])

Initialize a CriticCALF object.

accept_or_reject_weights(weights[, …])

Determine whether to accept or reject the given weights based on whether they violate the given constraints.

cache_weights([weights])

Stores a copy of the current model weights.

initialize_buffers()

Initialize the action and observation buffers with zeros.

objective(*args, **kwargs)

Objective of the critic, which is the sum of the squared temporal difference and the penalty for violating the CALF decay constraint, if the penalty parameter is non-zero.

optimize_weights([time])

Compute optimized critic weights, possibly subject to constraints.

reset()

Reset the outcome and current critic loss variables, and re-initialize the buffers.

restore_weights()

Restores the model weights to the cached weights.

update_and_cache_weights([weights])

Update the model’s weights and cache the new values.

update_buffers(observation, action)

Update data buffers and dynamic safe decay rate.

update_outcome(observation, action)

Update the outcome variable based on the running objective and the current observation and action.

update_target(new_target)

update_weights([weights])

Update the weights of the critic model.

Attributes

optimizer_engine

Returns the engine used by the optimizer.