ppo

package

v0.0.0-...-225e849 Latest Latest Go to latest Published: Oct 22, 2020 License: Apache-2.0 Imports: 14 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/aunum/gold

Links

Open Source Insights

README ¶

Proximal Policy Optimization

In Progress ⚠️ blocked on https://github.com/gorgonia/gorgonia/issues/373

Implementation of the Proximal Policy Optimization algorithm.

How it works

PPO is an on-policy method that aims to solve the step size issue with policy gradients. Typically policy gradient algorithms are very sensitive to step size, too large a step and the agent can fall into an unrecoverable state, to small a size and the agent takes a very long time to train. PPO solves this issue by ensuring that an agents policy never deviates too far from the previous policy.

A ratio is taken of the old policy to the new policy and the delta is clipped to ensure policy changes remain within a bounds.

Examples

See the experiments folder for example implementations.

Roadmap

waiting on bug https://github.com/gorgonia/gorgonia/issues/373

References

Documentation ¶

Overview ¶

Package ppo is an agent implementation of the Proximal Policy Optimization algorithm.

Index ¶

Variables
func GAE(values, masks, rewards []*t.Dense, gamma, lambda float32) (returns, advantage *t.Dense, err error)
func MakeActor(config *ModelConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)
func MakeCritic(config *ModelConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)
func WithClip(val float64) func(*Loss)
func WithCriticDiscount(val float32) func(*Loss)
func WithEntropyBeta(val float32) func(*Loss)
type Agent
- func NewAgent(c *AgentConfig, env *envv1.Env) (*Agent, error)
- func (a *Agent) Action(state *tensor.Dense) (action int, event *Event, err error)
- func (a *Agent) Learn(event *Event) error
type AgentConfig
type BatchedEvents
type Event
- func NewEvent(state, actionProbs, actionOneHot, qValue *tensor.Dense) *Event
- func (e *Event) Apply(outcome *envv1.Outcome)
type Events
- func (e *Events) Batch() (events *BatchedEvents, err error)
type Hyperparameters
type LayerBuilder
type Loss
- func NewLoss(oldProbs, advantages, rewards, values *modelv1.Input, opts ...LossOpt) *Loss
- func (l *Loss) CloneTo(graph *g.ExprGraph, opts ...modelv1.CloneOpt) modelv1.Loss
- func (l *Loss) Compute(yHat, y *g.Node) (loss *g.Node, err error)
- func (l *Loss) Inputs() modelv1.Inputs
type LossOpt
type Memory
- func NewMemory() *Memory
- func (m *Memory) Len() int
- func (m *Memory) Pop() (e *Events)
- func (m *Memory) Remember(event *Event) error
- func (m *Memory) Reset()
type ModelConfig

Constants ¶

This section is empty.

Variables ¶

View Source

var DefaultActorConfig = &ModelConfig{
	Optimizer:    g.NewAdamSolver(),
	LayerBuilder: DefaultActorLayerBuilder,
	BatchSize:    20,
}

DefaultActorConfig are the default hyperparameters for a policy.

View Source

var DefaultActorLayerBuilder = func(env *envv1.Env) []layer.Config {
	return []layer.Config{
		layer.FC{Input: env.ObservationSpaceShape()[0], Output: 24},
		layer.FC{Input: 24, Output: 24},
		layer.FC{Input: 24, Output: envv1.PotentialsShape(env.ActionSpace)[0], Activation: layer.Softmax},
	}
}

DefaultActorLayerBuilder is a default fully connected layer builder.

View Source

var DefaultAgentConfig = &AgentConfig{
	Hyperparameters: DefaultHyperparameters,
	Base:            agentv1.NewBase("PPO"),
	ActorConfig:     DefaultActorConfig,
	CriticConfig:    DefaultCriticConfig,
}

DefaultAgentConfig is the default config for a dqn agent.

View Source

var DefaultCriticConfig = &ModelConfig{
	Loss:         modelv1.MSE,
	Optimizer:    g.NewAdamSolver(),
	LayerBuilder: DefaultCriticLayerBuilder,
	BatchSize:    20,
}

DefaultCriticConfig are the default hyperparameters for a policy.

View Source

var DefaultCriticLayerBuilder = func(env *envv1.Env) []layer.Config {
	return []layer.Config{
		layer.FC{Input: env.ObservationSpaceShape()[0], Output: 24},
		layer.FC{Input: 24, Output: 24},
		layer.FC{Input: 24, Output: 1, Activation: layer.Tanh},
	}
}

DefaultCriticLayerBuilder is a default fully connected layer builder.

View Source

var DefaultHyperparameters = &Hyperparameters{
	Gamma:  0.99,
	Lambda: 0.95,
}

DefaultHyperparameters are the default hyperparameters.

Functions ¶

func GAE ¶

func GAE(values, masks, rewards []*t.Dense, gamma, lambda float32) (returns, advantage *t.Dense, err error)

GAE is generalized advantage estimation.

func MakeActor ¶

func MakeActor(config *ModelConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)

MakeActor makes the actor which chooses actions based on the policy.

func MakeCritic ¶

func MakeCritic(config *ModelConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)

MakeCritic makes the critic which creats a qValue based on the outcome of the action taken.

func WithClip ¶

func WithClip(val float64) func(*Loss)

WithClip sets the clipping value. Defaults to 0.2

func WithCriticDiscount ¶

func WithCriticDiscount(val float32) func(*Loss)

WithCriticDiscount sets the critic discount. Defaults to 0.5

func WithEntropyBeta ¶

func WithEntropyBeta(val float32) func(*Loss)

WithEntropyBeta sets the entropy beta. Defaults to 0.001

Types ¶

type Agent ¶

type Agent struct {
	// Base for the agent.
	*agentv1.Base

	// Hyperparameters for the dqn agent.
	*Hyperparameters

	// Actor chooses actions.
	Actor modelv1.Model

	// Critic updates params.
	Critic modelv1.Model

	// Memory of the agent.
	Memory *Memory
	// contains filtered or unexported fields
}

Agent is a dqn agent.

func NewAgent ¶

func NewAgent(c *AgentConfig, env *envv1.Env) (*Agent, error)

NewAgent returns a new dqn agent.

func (*Agent) Action ¶

func (a *Agent) Action(state *tensor.Dense) (action int, event *Event, err error)

Action selects the best known action for the given state.

func (*Agent) Learn ¶

func (a *Agent) Learn(event *Event) error

Learn the agent.

type AgentConfig ¶

type AgentConfig struct {
	// Base for the agent.
	Base *agentv1.Base

	// Hyperparameters for the agent.
	*Hyperparameters

	// ActorConfig is the actor model config.
	ActorConfig *ModelConfig

	// CriticConfig is the critic model config.
	CriticConfig *ModelConfig
}

AgentConfig is the config for a dqn agent.

type BatchedEvents ¶

type BatchedEvents struct {
	States, ActionProbs, ActionOneHots, QValues, Masks, Rewards *tensor.Dense
	Len                                                         int
}

BatchedEvents are the events as a batched tensor.

type Event ¶

type Event struct {
	State, ActionProbs, ActionOneHot, QValue, Mask, Reward *tensor.Dense
}

Event is an event that occurred when interacting with an environment.

func NewEvent ¶

func NewEvent(state, actionProbs, actionOneHot, qValue *tensor.Dense) *Event

NewEvent returns a new event.

func (*Event) Apply ¶

func (e *Event) Apply(outcome *envv1.Outcome)

Apply an outcome to an event.

type Events ¶

type Events struct {
	States, ActionProbs, ActionOneHots, QValues, Masks, Rewards []*tensor.Dense
}

Events are the events as a batched tensor.

func (*Events) Batch ¶

func (e *Events) Batch() (events *BatchedEvents, err error)

Batch the events.

type Hyperparameters ¶

type Hyperparameters struct {
	// Gamma is the discount factor (0≤γ≤1). It determines how much importance we want to give to future
	// rewards. A high value for the discount factor (close to 1) captures the long-term effective award, whereas,
	// a discount factor of 0 makes our agent consider only immediate reward, hence making it greedy.
	Gamma float32

	// Lambda is the smoothing factor which is used to reduce variance and stablilize training.
	Lambda float32
}

Hyperparameters for the dqn agent.

type LayerBuilder ¶

type LayerBuilder func(env *envv1.Env) []layer.Config

LayerBuilder builds layers.

type Loss ¶

type Loss struct {
	// contains filtered or unexported fields
}

Loss is a custom loss for PPO. It is designed to ensure that policies are never over updated.

func NewLoss ¶

func NewLoss(oldProbs, advantages, rewards, values *modelv1.Input, opts ...LossOpt) *Loss

NewLoss returns a new PPO loss.

func (*Loss) CloneTo ¶

func (l *Loss) CloneTo(graph *g.ExprGraph, opts ...modelv1.CloneOpt) modelv1.Loss

CloneTo another graph.

func (*Loss) Compute ¶

func (l *Loss) Compute(yHat, y *g.Node) (loss *g.Node, err error)

Compute the loss.

func (*Loss) Inputs ¶

func (l *Loss) Inputs() modelv1.Inputs

Inputs returns any inputs the loss function utilizes.

type LossOpt ¶

type LossOpt func(*Loss)

LossOpt is an option for PPO loss.

type Memory ¶

type Memory struct {
	// contains filtered or unexported fields
}

Memory for the dqn agent.

func NewMemory ¶

func NewMemory() *Memory

NewMemory returns a new Memory store.

func (*Memory) Len ¶

func (m *Memory) Len() int

Len is the number of events in the memory.

func (*Memory) Pop ¶

func (m *Memory) Pop() (e *Events)

Pop the values out of the memory.

func (*Memory) Remember ¶

func (m *Memory) Remember(event *Event) error

Remember an event.

func (*Memory) Reset ¶

func (m *Memory) Reset()

Reset the memory.

type ModelConfig ¶

type ModelConfig struct {
	// Loss function to evaluate network performance.
	Loss modelv1.Loss

	// Optimizer to optimize the weights with regards to the error.
	Optimizer g.Solver

	// LayerBuilder is a builder of layer.
	LayerBuilder LayerBuilder

	// BatchSize of the updates.
	BatchSize int

	// Track is whether to track the model.
	Track bool
}

ModelConfig are the hyperparameters for a model.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
experiments
cartpole command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL