rcognita.critics.CriticCALF
- class rcognita.critics.CriticCALF(*args, safe_decay_rate=0.001, is_dynamic_decay_rate=True, predictor=None, observation_init=None, safe_controller=None, penalty_param=0, is_predictive=True, **kwargs)
- __init__(*args, safe_decay_rate=0.001, is_dynamic_decay_rate=True, predictor=None, observation_init=None, safe_controller=None, penalty_param=0, is_predictive=True, **kwargs)
Initialize a CriticCALF object.
- Parameters
args – Arguments to be passed to the base class CriticOfObservation.
safe_decay_rate – Rate at which the safe set shrinks over time.
is_dynamic_decay_rate – Whether the decay rate should be dynamic or not.
predictor – A predictor object to be used to predict future observations.
observation_init – Initial observation to be used to initialize the safe set.
safe_controller – Safe controller object to be used to compute stabilizing actions.
penalty_param – Penalty parameter to be used in the CALF objective.
is_predictive – Whether the safe constraints should be computed based on predictions or not.
kwargs – Keyword arguments to be passed to the base class CriticOfObservation.
Methods
CALF_critic_lower_bound_constraint
(weights)Constraint that ensures that the value of the critic is above a certain lower bound.
CALF_critic_upper_bound_constraint
(weights)Calculate the constraint violation for the CALF decay constraint when no prediction is made.
CALF_decay_constraint_no_prediction
(weights)Constraint that ensures that the CALF value is decreasing by a certain rate.
CALF_decay_constraint_predicted_on_policy
(weights)Constraint for ensuring that the CALF function decreases at each iteration.
CALF_decay_constraint_predicted_safe_policy
(weights)Calculate the constraint violation for the CALF decay constraint when a predicted safe policy is used.
__init__
(*args[, safe_decay_rate, …])Initialize a CriticCALF object.
accept_or_reject_weights
(weights[, …])Determine whether to accept or reject the given weights based on whether they violate the given constraints.
cache_weights
([weights])Stores a copy of the current model weights.
initialize_buffers
()Initialize the action and observation buffers with zeros.
objective
(*args, **kwargs)Objective of the critic, which is the sum of the squared temporal difference and the penalty for violating the CALF decay constraint, if the penalty parameter is non-zero.
optimize_weights
([time])Compute optimized critic weights, possibly subject to constraints.
reset
()Reset the outcome and current critic loss variables, and re-initialize the buffers.
restore_weights
()Restores the model weights to the cached weights.
update_and_cache_weights
([weights])Update the model’s weights and cache the new values.
update_buffers
(observation, action)Update data buffers and dynamic safe decay rate.
update_outcome
(observation, action)Update the outcome variable based on the running objective and the current observation and action.
update_target
(new_target)update_weights
([weights])Update the weights of the critic model.
Attributes
optimizer_engine
Returns the engine used by the optimizer.