Documentation
¶
Overview ¶
Package ppo is an agent implementation of the Proximal Policy Optimization algorithm.
Index ¶
- Variables
- func GAE(values, masks, rewards []*t.Dense, gamma, lambda float32) (returns, advantage *t.Dense, err error)
- func MakeActor(config *ModelConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)
- func MakeCritic(config *ModelConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)
- func WithClip(val float64) func(*Loss)
- func WithCriticDiscount(val float32) func(*Loss)
- func WithEntropyBeta(val float32) func(*Loss)
- type Agent
- type AgentConfig
- type BatchedEvents
- type Event
- type Events
- type Hyperparameters
- type LayerBuilder
- type Loss
- type LossOpt
- type Memory
- type ModelConfig
Constants ¶
This section is empty.
Variables ¶
var DefaultActorConfig = &ModelConfig{ Optimizer: g.NewAdamSolver(), LayerBuilder: DefaultActorLayerBuilder, BatchSize: 20, }
DefaultActorConfig are the default hyperparameters for a policy.
var DefaultActorLayerBuilder = func(env *envv1.Env) []layer.Config { return []layer.Config{ layer.FC{Input: env.ObservationSpaceShape()[0], Output: 24}, layer.FC{Input: 24, Output: 24}, layer.FC{Input: 24, Output: envv1.PotentialsShape(env.ActionSpace)[0], Activation: layer.Softmax}, } }
DefaultActorLayerBuilder is a default fully connected layer builder.
var DefaultAgentConfig = &AgentConfig{ Hyperparameters: DefaultHyperparameters, Base: agentv1.NewBase("PPO"), ActorConfig: DefaultActorConfig, CriticConfig: DefaultCriticConfig, }
DefaultAgentConfig is the default config for a dqn agent.
var DefaultCriticConfig = &ModelConfig{ Loss: modelv1.MSE, Optimizer: g.NewAdamSolver(), LayerBuilder: DefaultCriticLayerBuilder, BatchSize: 20, }
DefaultCriticConfig are the default hyperparameters for a policy.
var DefaultCriticLayerBuilder = func(env *envv1.Env) []layer.Config { return []layer.Config{ layer.FC{Input: env.ObservationSpaceShape()[0], Output: 24}, layer.FC{Input: 24, Output: 24}, layer.FC{Input: 24, Output: 1, Activation: layer.Tanh}, } }
DefaultCriticLayerBuilder is a default fully connected layer builder.
var DefaultHyperparameters = &Hyperparameters{
Gamma: 0.99,
Lambda: 0.95,
}
DefaultHyperparameters are the default hyperparameters.
Functions ¶
func GAE ¶
func GAE(values, masks, rewards []*t.Dense, gamma, lambda float32) (returns, advantage *t.Dense, err error)
GAE is generalized advantage estimation.
func MakeCritic ¶
MakeCritic makes the critic which creats a qValue based on the outcome of the action taken.
func WithCriticDiscount ¶
WithCriticDiscount sets the critic discount. Defaults to 0.5
func WithEntropyBeta ¶
WithEntropyBeta sets the entropy beta. Defaults to 0.001
Types ¶
type Agent ¶
type Agent struct {
// Base for the agent.
*agentv1.Base
// Hyperparameters for the dqn agent.
*Hyperparameters
// Actor chooses actions.
Actor modelv1.Model
// Critic updates params.
Critic modelv1.Model
// Memory of the agent.
Memory *Memory
// contains filtered or unexported fields
}
Agent is a dqn agent.
func NewAgent ¶
func NewAgent(c *AgentConfig, env *envv1.Env) (*Agent, error)
NewAgent returns a new dqn agent.
type AgentConfig ¶
type AgentConfig struct {
// Base for the agent.
Base *agentv1.Base
// Hyperparameters for the agent.
*Hyperparameters
// ActorConfig is the actor model config.
ActorConfig *ModelConfig
// CriticConfig is the critic model config.
CriticConfig *ModelConfig
}
AgentConfig is the config for a dqn agent.
type BatchedEvents ¶
type BatchedEvents struct {
States, ActionProbs, ActionOneHots, QValues, Masks, Rewards *tensor.Dense
Len int
}
BatchedEvents are the events as a batched tensor.
type Event ¶
Event is an event that occurred when interacting with an environment.
type Events ¶
Events are the events as a batched tensor.
func (*Events) Batch ¶
func (e *Events) Batch() (events *BatchedEvents, err error)
Batch the events.
type Hyperparameters ¶
type Hyperparameters struct {
// Gamma is the discount factor (0≤γ≤1). It determines how much importance we want to give to future
// rewards. A high value for the discount factor (close to 1) captures the long-term effective award, whereas,
// a discount factor of 0 makes our agent consider only immediate reward, hence making it greedy.
Gamma float32
// Lambda is the smoothing factor which is used to reduce variance and stablilize training.
Lambda float32
}
Hyperparameters for the dqn agent.
type LayerBuilder ¶
LayerBuilder builds layers.
type Loss ¶
type Loss struct {
// contains filtered or unexported fields
}
Loss is a custom loss for PPO. It is designed to ensure that policies are never over updated.
type Memory ¶
type Memory struct {
// contains filtered or unexported fields
}
Memory for the dqn agent.
type ModelConfig ¶
type ModelConfig struct {
// Loss function to evaluate network performance.
Loss modelv1.Loss
// Optimizer to optimize the weights with regards to the error.
Optimizer g.Solver
// LayerBuilder is a builder of layer.
LayerBuilder LayerBuilder
// BatchSize of the updates.
BatchSize int
// Track is whether to track the model.
Track bool
}
ModelConfig are the hyperparameters for a model.