her

package

v0.1.1 Latest Latest Go to latest Published: May 13, 2023 License: Apache-2.0 Imports: 13 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/m8u/gold

Links

Open Source Insights

README ¶

Hindsight Experience Replay

Hindsight experience replay allows an agent to learn in environments with sparse rewards and multiple goals.

How it works

HER utilizes UVFAs and works by augmenting experience replays with additional goals. The intuition being there is valuable information to be learned even when the end goal is not reached e.g. if I miss a shot in basketball I can still reason that had the hoop been slightly moved I would have made it.

HER is a sort of intrisic ciriculum learning in which the agent is able to learn from smaller goals before reaching the larger ones.

Examples

See the experiments folder for example implementations.

Roadmap

n>15 on bitflip
More hindsight types
More environments (push-drag)

References

Documentation ¶

Overview ¶

Package her is an agent implementation of the Hindsight Experience Replay algorithm.

Index ¶

Variables
func MakePolicy(name string, config *PolicyConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)
type Agent
- func NewAgent(c *AgentConfig, env *envv1.Env) (*Agent, error)
type AgentConfig
type Event
- func NewEvent(state, goal *tensor.Dense, outcome *envv1.Outcome) *Event
- func (e *Event) Copy() *Event
- func (e *Event) Print()
type Events
- func (e Events) Copy() Events
type Hyperparameters
type LayerBuilder
type Memory
- func NewMemory(size int) *Memory
type PolicyConfig

Constants ¶

This section is empty.

Variables ¶

View Source

var DefaultAgentConfig = &AgentConfig{
	Hyperparameters:  DefaultHyperparameters,
	PolicyConfig:     DefaultPolicyConfig,
	Base:             agentv1.NewBase("HER"),
	SuccessfulReward: 0,
	MemorySize:       1e4,
}

DefaultAgentConfig is the default config for a dqn+her agent.

View Source

var DefaultFCLayerBuilder = func(x, y *modelv1.Input) []layer.Config {
	return []layer.Config{
		layer.FC{Input: x.Squeeze()[0], Output: 512},
		layer.FC{Input: 512, Output: 512},
		layer.FC{Input: 512, Output: y.Squeeze()[0], Activation: layer.Linear},
	}
}

DefaultFCLayerBuilder is a default fully connected layer builder.

View Source

var DefaultHyperparameters = &Hyperparameters{
	Epsilon:              common.DefaultDecaySchedule(),
	Gamma:                0.9,
	UpdateTargetEpisodes: 50,
}

DefaultHyperparameters are the default hyperparameters.

View Source

var DefaultPolicyConfig = &PolicyConfig{
	Loss:         modelv1.MSE,
	Optimizer:    g.NewAdamSolver(g.WithBatchSize(128), g.WithLearnRate(0.0005)),
	LayerBuilder: DefaultFCLayerBuilder,
	BatchSize:    128,
	Track:        true,
}

DefaultPolicyConfig are the default hyperparameters for a policy.

Functions ¶

func MakePolicy ¶

func MakePolicy(name string, config *PolicyConfig, base *agentv1.Base, env *envv1.Env) (modelv1.Model, error)

MakePolicy makes a model.

Types ¶

type Agent ¶

type Agent struct {
	// Base for the agent.
	*agentv1.Base

	// Hyperparameters for the dqn+her agent.
	*Hyperparameters

	Policy       model.Model
	TargetPolicy model.Model
	Epsilon      common.Schedule
	// contains filtered or unexported fields
}

Agent is a dqn+her agent.

func NewAgent ¶

func NewAgent(c *AgentConfig, env *envv1.Env) (*Agent, error)

NewAgent returns a new dqn+her agent.

func (*Agent) Action ¶

func (a *Agent) Action(state, goal *tensor.Dense) (action int, err error)

Action selects the best known action for the given state.

func (*Agent) Hindsight ¶

func (a *Agent) Hindsight(episodeEvents Events) error

Hindsight applies hindsight to the memory.

func (*Agent) Learn ¶

func (a *Agent) Learn() error

Learn the agent.

func (*Agent) Remember ¶

func (a *Agent) Remember(event ...*Event)

Remember events.

type AgentConfig ¶

type AgentConfig struct {
	// Base for the agent.
	Base *agentv1.Base

	// Hyperparameters for the agent.
	*Hyperparameters

	// PolicyConfig for the agent.
	PolicyConfig *PolicyConfig

	// SuccessfulReward is the reward for reaching the goal.
	SuccessfulReward float32

	// MemorySize is the size of the memory.
	MemorySize int
}

AgentConfig is the config for a dqn+her agent.

type Event ¶

type Event struct {
	*envv1.Outcome

	// State by which the action was taken.
	State *tensor.Dense

	// Goal the agent is trying to reach.
	Goal *tensor.Dense
	// contains filtered or unexported fields
}

Event is an event that occurred.

func NewEvent ¶

func NewEvent(state, goal *tensor.Dense, outcome *envv1.Outcome) *Event

NewEvent returns a new event

func (*Event) Copy ¶

func (e *Event) Copy() *Event

Copy the event.

func (*Event) Print ¶

func (e *Event) Print()

Print the event.

type Events ¶

type Events []*Event

Events that occurred.

func (Events) Copy ¶

func (e Events) Copy() Events

Copy the events.

type Hyperparameters ¶

type Hyperparameters struct {
	// Gamma is the discount factor (0≤γ≤1). It determines how much importance we want to give to future
	// rewards. A high value for the discount factor (close to 1) captures the long-term effective award, whereas,
	// a discount factor of 0 makes our agent consider only immediate reward, hence making it greedy.
	Gamma float32

	// Epsilon is the rate at which the agent should exploit vs explore.
	Epsilon common.Schedule

	// UpdateTargetEpisodes determines how often the target network updates its parameters.
	UpdateTargetEpisodes int
}

Hyperparameters for the dqn+her agent.

type LayerBuilder ¶

type LayerBuilder func(x, y *modelv1.Input) []layer.Config

LayerBuilder builds layers.

type Memory ¶

type Memory struct {
	// contains filtered or unexported fields
}

Memory for the dqn agent.

func NewMemory ¶

func NewMemory(size int) *Memory

NewMemory returns a new Memory store.

func (*Memory) Len ¶

func (m *Memory) Len() int

Len of the memory.

func (*Memory) Remember ¶

func (m *Memory) Remember(events ...*Event)

Remember events.

func (*Memory) Sample ¶

func (m *Memory) Sample(batchsize int) (ret []*Event, err error)

Sample a batch size from memory.

type PolicyConfig ¶

type PolicyConfig struct {
	// Loss function to evaluate network performance.
	Loss modelv1.Loss

	// Optimizer to optimize the weights with regards to the error.
	Optimizer g.Solver

	// LayerBuilder is a builder of layer.
	LayerBuilder LayerBuilder

	// Batch size to train on.
	BatchSize int

	// Track is whether to track the model.
	Track bool
}

PolicyConfig are the hyperparameters for a policy.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
experiments
bitflip command

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL