Documentation
¶
Index ¶
- type Bandit
- type BetaDist
- type ContextMarshaler
- type ContextualRewardStub
- type Dist
- type EpsilonGreedy
- type ErrRewardNon2XX
- type HTTPSource
- type HTTPSourceOption
- type HttpDoer
- type Integrator
- type MarshalFunc
- type NormalDist
- type ParseFunc
- type PointDist
- type Proportional
- type Result
- type RewardParser
- type RewardSource
- type RewardStub
- type Sampler
- type Sha1Sampler
- type Strategy
- type Thompson
- type ThompsonMC
Examples ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Bandit ¶
type Bandit struct {
RewardSource
Strategy
Sampler
}
A Bandit gets reward values from a RewardSource, computes selection probabilities using a Strategy, and selects an arm using a Sampler.
func (*Bandit) SelectArm ¶
func (b *Bandit) SelectArm(ctx context.Context, unit string, banditContext interface{}) (Result, error)
SelectArm gets the current reward estimates, computes the arm selection probabilities, and selects and arm index. Returns a partial result and an error message if an error is encountered at any point. For example, if the reward estimates were retrieved, but an error was encountered during the probability computation, the result will contain the reward estimates, but no probabilities or arm index. There is an unfortunate name collision between a multi-armed bandit context and Go's context.Context type. The context.Context argument should only be used for passing request-scoped data to an external reward service, such as timeouts and cancellation propagation. The banditContext argument is used to pass bandit context features to the reward source for contextual bandits. The unit argument is a string that will be hashed to select an arm with the pseudo-random sampler. SelectArm is deterministic for a fixed unit and set of reward estimates from the RewardSource.
Example ¶
rewards := []Dist{
Beta(1989, 21290),
Beta(40, 474),
Beta(64, 730),
Beta(71, 818),
Beta(52, 659),
Beta(59, 718),
}
b := Bandit{
RewardSource: &RewardStub{Rewards: rewards},
Strategy: NewThompson(numint.NewQuadrature()),
Sampler: NewSha1Sampler(),
}
result, err := b.SelectArm(context.Background(), "12345", nil)
if err != nil {
panic(err)
}
fmt.Println(result.Arm)
Output: 2
type ContextMarshaler ¶ added in v0.0.2
ContextMarshaler is called on the banditContext and the result will become the body of the request to the bandit service.
type ContextualRewardStub ¶ added in v0.0.2
ContextualRewardStub is a static contextual RewardSource that can be used for testing and development of contextual bandits. It assumes that the context can be specified with a string.
func (*ContextualRewardStub) GetRewards ¶ added in v0.0.2
func (c *ContextualRewardStub) GetRewards(ctx context.Context, banditContext interface{}) ([]Dist, error)
GetRewards gets the static rewards for a given banditContext string.
type Dist ¶
type Dist interface {
// CDF returns the cumulative distribution function evaluated at x.
CDF(x float64) float64
// Mean returns the mean of the distribution.
Mean() float64
// Prob returns the probability density function or probability mass function evaluated at x.
Prob(x float64) float64
// Rand returns a pseudo-random sample drawn from the distribution.
Rand() float64
// Support returns the range of values over which the distribution is considered non-zero for the purposes of numerical integration.
Support() (float64, float64)
}
A Dist represents a one-dimensional probability distribution. Reward estimates are represented as a Dist for each arm. Strategies compute arm-selection probabilities using the Dist interface. This allows for combining different distributions with different strategies.
func BetaFromJSON ¶ added in v0.0.2
BetaFromJSON converts a JSON-encoded array of Beta distributions to a []Dist. Expects the JSON data to be in the form:
`[{"alpha": 123, "beta": 456}, {"alpha": 3.1415, "beta": 9.999}]`
Returns an error if alpha or beta value are missing or less than 1 for any arm. Any additional keys are ignored.
func NormalFromJSON ¶ added in v0.0.2
NormalFromJSON converts a JSON-encoded array of Normal distributions to a []Dist. Expects the JSON data to be in the form:
`[{"mu": 123, "sigma": 456}, {"mu": 3.1415, "sigma": 9.999}]`
Returns an error if mu or sigma value are missing or sigma is less than 0 for any arm. Any additional keys are ignored.
func PointFromJSON ¶ added in v0.0.2
PointFromJSON converts a JSON-encoded array of Point distributions to a []Dist. Expects the JSON data to be in the form:
`[{"mu": 123}, {"mu": 3.1415}]`
Returns an error if mu value is missing for any arm. Any additional keys are ignored.
type EpsilonGreedy ¶
type EpsilonGreedy struct {
Epsilon float64
// contains filtered or unexported fields
}
EpsilonGreedy implements the epsilon-greedy bandit strategy. The Epsilon parameter must be greater than zero. If any arm has a Null distribution, it will have zero selection probability, and the other arms' probabilities will be computed as if the Null arms are not present. Ties are accounted for, so if multiple arms have the maximum mean reward estimate, they will have equal probabilities.
func NewEpsilonGreedy ¶ added in v0.0.2
func NewEpsilonGreedy(e float64) *EpsilonGreedy
func (*EpsilonGreedy) ComputeProbs ¶
func (e *EpsilonGreedy) ComputeProbs(rewards []Dist) ([]float64, error)
ComputeProbs computes the arm selection probabilities from the set of reward estimates, accounting for Nulls and ties. Returns an error if epsilon is less than zero.
type ErrRewardNon2XX ¶ added in v0.0.6
func (*ErrRewardNon2XX) Error ¶ added in v0.0.6
func (e *ErrRewardNon2XX) Error() string
type HTTPSource ¶ added in v0.0.2
type HTTPSource struct {
// contains filtered or unexported fields
}
HTTPSource is a basic implementation of RewardSource that gets reward estimates from an HTTP reward service.
func NewHTTPSource ¶ added in v0.0.2
func NewHTTPSource(client HttpDoer, url string, parser RewardParser, opts ...HTTPSourceOption) *HTTPSource
NewHTTPSource returns a new HTTPSource given an HttpDoer, a url for the reward service, and a RewardParser. Optionally provide a ContextMarshaler for encoding bandit context. For example, if a reward service running on localhost:1337 provides Beta reward estimates:
client := &http.Client{timeout: time.Duration(100*time.Millisecond)}
url := "localhost:1337/rewards"
parser := ParseFunc(BetaFromJSON)
marshaler := MarshalFunc(json.Marshal)
source := NewHTTPSource(client, url, parser, WithContextMashaler(marshaler))
func (*HTTPSource) GetRewards ¶ added in v0.0.2
func (h *HTTPSource) GetRewards(ctx context.Context, banditContext interface{}) ([]Dist, error)
GetRewards makes a POST request to the reward URL, and parses the response into a []Dist. If a banditContext is provided, it will be marshaled and included in the body of the request.
type HTTPSourceOption ¶ added in v0.0.2
type HTTPSourceOption func(source *HTTPSource)
HTTPSourceOption allows for optional arguments to NewHTTPSource
func WithContextMarshaler ¶ added in v0.0.2
func WithContextMarshaler(m ContextMarshaler) HTTPSourceOption
WithContextMarshaler is an optional argument to HTTPSource
type HttpDoer ¶ added in v0.0.2
HTTPDoer is a basic interface for making HTTP requests. The net/http Client can be used or you can bring your own. Heimdall is a pretty good alternative client with some nice features: https://github.com/gojek/heimdall
type Integrator ¶
type MarshalFunc ¶ added in v0.0.2
MarshalFunc is an adapter to allow a normal function to be used as a ContextMarshaler
func (MarshalFunc) Marshal ¶ added in v0.0.2
func (m MarshalFunc) Marshal(banditContext interface{}) ([]byte, error)
type NormalDist ¶
func Normal ¶
func Normal(mu, sigma float64) NormalDist
Normal is a normal distribution for use with any bandit strategy. For the purposes of Thompson sampling, it is truncated at mean +/- 4*sigma
func (NormalDist) String ¶
func (n NormalDist) String() string
func (NormalDist) Support ¶
func (n NormalDist) Support() (float64, float64)
type ParseFunc ¶ added in v0.0.2
ParseFunc is an adapter to allow a normal function to be used as a RewardParser
type PointDist ¶
type PointDist struct {
Mu float64
}
func Null ¶ added in v0.0.2
func Null() PointDist
Null returns a PointDist with mean equal to negative infinity. This is a special value that indicates to a Strategy that this arm should get selection probability zero.
type Proportional ¶
type Proportional struct {
// contains filtered or unexported fields
}
Proportional is a trivial bandit strategy that returns arm-selection probabilities proportional to the mean reward estimate for each arm. This can be used when a bandit service wants to provide selection weights rather than reward estimates. Proportional treats Point(0) and Null() the same way, assigning them zero selection probability.
func NewProportional ¶ added in v0.0.2
func NewProportional() *Proportional
func (*Proportional) ComputeProbs ¶
func (p *Proportional) ComputeProbs(rewards []Dist) ([]float64, error)
ComputeProbs computes probabilities proportional to the mean reward of each arm. Returns an error if any arm has a negative finite mean reward. A mean reward of negative infinity is treated as zero, so that a Null() distribution is treated the same as Point(0).
type Result ¶
Result is the return type for a call to Bandit.SelectArm. It will contain the reward estimates provided by the RewardSource, the computed arm selection probabilities, and the index of the selected arm.
type RewardParser ¶ added in v0.0.2
RewardParser will be called to convert the response from the reward service to a slice of distributions.
type RewardSource ¶
type RewardSource interface {
GetRewards(ctx context.Context, banditContext interface{}) ([]Dist, error)
}
A RewardSource provides the current reward estimates, in the form of a Dist for each arm. There is an unfortunate name collision between a multi-armed bandit context and Go's Context type. The first argument is a context.Context and should only be used for passing request-scoped data to an external reward service. If the RewardSource does not require an external request, this first argument should always be context.Background() The second argument is used to pass context values to the reward source for contextual bandits. A RewardSource implementation should provide the reward estimates conditioned on the value of banditContext. For non-contextual bandits, banditContext can be nil.
type RewardStub ¶
type RewardStub struct {
Rewards []Dist
}
RewardStub is a static non-contextual RewardSource that can be used for testing and development.
func (*RewardStub) GetRewards ¶
func (s *RewardStub) GetRewards(context.Context, interface{}) ([]Dist, error)
GetRewards gets the static rewards
type Sampler ¶
A Sampler returns a pseudo-random arm index given a set of probabilities and a string to hash. Samplers should always return the same arm index for the same set of probabilities and unit value.
type Sha1Sampler ¶
type Sha1Sampler struct {
// contains filtered or unexported fields
}
Sha1Sampler is a Sampler that uses the SHA1 hash of input unit to select an arm index with probability proportional to some given weights.
func NewSha1Sampler ¶
func NewSha1Sampler() *Sha1Sampler
type Thompson ¶
type Thompson struct {
// contains filtered or unexported fields
}
func NewThompson ¶
func NewThompson(integrator Integrator) *Thompson
type ThompsonMC ¶
type ThompsonMC struct {
NumIterations int
// contains filtered or unexported fields
}
ThompsonMC is a Monte-Carlo based implementation of Thompson sampling Strategy. It should not be used in production but is provided only as an example and for comparison with the Thompson Strategy, which is much faster and more accurate.
func NewThompsonMC ¶
func NewThompsonMC(numIterations int) *ThompsonMC
NewThompsonMC returns a new ThompsonMC with numIterations.
func (*ThompsonMC) ComputeProbs ¶
func (t *ThompsonMC) ComputeProbs(rewards []Dist) ([]float64, error)
ComputeProbs estimates the arm-selection probabilities by repeatedly sampling from the Dist for each arm, and counting how many times each arm yields the maximal sampled value.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
examples
|
|
|
superstream_demo
module
|
|
|
Package numint provides rules and methods for one-dimensional numerical quadrature
|
Package numint provides rules and methods for one-dimensional numerical quadrature |

