bayesian

package
v0.0.0-...-ffc71f6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 16, 2026 License: Apache-2.0 Imports: 5 Imported by: 0

Documentation

Overview

Package bayesian implements bayesian analysis for detecting change points.

Index

Constants

View Source
const (
	// Confidence interval tail is the default tail value for confidence analysis.
	// 0.005 tail gives 99% confidence interval.
	ConfidenceIntervalTail = 0.005
)

Variables

This section is empty.

Functions

func Lgamma

func Lgamma(x float64) float64

Types

type BetaDistribution

type BetaDistribution struct {
	// Alpha parameter of Beta distribution.
	// Must be greater than 0.
	Alpha float64
	// Beta parameter of the Beta distribution.
	// Must be greater than 0.
	Beta float64
}

BetaDistribution specifies the parameters of a Beta distribution. See https://en.wikipedia.org/wiki/Beta_distribution.

In this package, we use beta distribution as the prior for a test's propensity to produce unexpected results, because it simplifies the maths.

Alpha = 1, Beta = 1 indicates an uninformative (uniform) prior.

type ChangepointPredictor

type ChangepointPredictor struct {
	// The assumed prior likelihood of a change point existing at any given
	// source position. This should not be too high or we will infer the
	// presence of too many changepoints.
	ChangepointLikelihood float64

	// The prior for the rate at which a test's runs have any
	// failing test result.
	// This is the prior for estimating the ratio
	// HasUnexpected / Runs of a segment.
	//
	// Generally tests tend to be either consistently passing or
	// consistently failing, with a bias towards consistently
	// passing, so shape parameters Alpha < 1, Beta < 1, Alpha < Beta
	// are typically selected (e.g. alpha = 0.3, beta = 0.5).
	HasUnexpectedPrior BetaDistribution

	// The prior for the rate at which a retried test's runs have
	// only failing results, given they have at least
	// two results and one is failing.
	//
	// This is the prior for estimating UnexpectedAfterRetry / Retried.
	// Generally the result of retrying a fail inside a test run
	// either leads to a pass (fairly consistently) or another failure
	// (fairly consistently). Consequently, shape parameters Alpha < 1,
	// Beta < 1 are advised (e.g. alpha = 0.5, beta = 0.5).
	UnexpectedAfterRetryPrior BetaDistribution

	// Whether the source system is assigning source positions on a source ref
	// consecutively, with no gaps.
	//
	// The test results for a test history are usually sparse. I.E. there
	// are test results missing for some source positions even though
	// those source positions 'exist' (correspond to real commits/CLs).
	//
	// When quantifying the uncertainty of a change point's position,
	// we assume that the prior likelihood of the change point being at
	// any given source position is uniform (the same)**. If we know that
	// source positions are assigned consecutively, we can infer that there
	// is proportionately more likelihood a changepoint will fall in a range
	// where there is a large source position gap between test results than one
	// in which there is a small gap between test results.
	//
	// If source positions are not assigned consecutively, we cannot make
	// this assumption (as e.g. positions 10, 20, 1000 could be sequential
	// Android builds on a release branch). In this case, will assume the
	// prior likelihood of a changepoint occurring at any source position
	// *with a test result* is equally likely and assume all other positions
	// do not exist.
	//
	// ** At some point we will want to revise our uniform prior assumption if
	// risk-based testing becomes common (e.g. projects only run tests at a
	// position if that change can affect a test), as then the likelihood
	// of a changepoint occuring at a non-tested position is lower than a
	// tested position.
	SourcePositionsConsecutive bool
}

ChangepointPredictor estimates the likely position of, and uncertainty around, change points in test histories.

func (ChangepointPredictor) ChangePointPositionDistribution

func (a ChangepointPredictor) ChangePointPositionDistribution(history []*inputbuffer.Run) *model.PositionDistribution

ChangePointPositionDistribution returns a quantization of the changepoint commit start position probability distribution for the given history, assuming it has exactly one changepoint.

history must be in increasing source position order.

See PositionDistribution for interpretation.

For theory background, please see go/bayesian-changepoint-estimation.

func (ChangepointPredictor) ChangePoints

func (a ChangepointPredictor) ChangePoints(history []*inputbuffer.Run, tail float64) []inputbuffer.ChangePoint

ChangePoints runs changepoint detection and confidence interval analysis for history. history is sorted by commit position ascendingly (oldest commit first).

func (ChangepointPredictor) FindBestChangepoint

func (a ChangepointPredictor) FindBestChangepoint(history []*inputbuffer.Run) (relativeLikelihood float64, position int)

FindBestChangepoint finds the change point position that maximises the likelihood of observing the given test history.

It returns the position of the change point in the history slice, as well as the change in the log-likelihood of observing the given test history that comes from assuming there is a changepoint at that position (relative to the `no change point` case).

The semantics of the returned position are as follows: a position p means the history is segmented as history[:p] and history[p:], i.e. the new behaviour starts at p inclusive. If the returned position is 0, it means no change point position was better than the `no change point` case.

This method requires the provided history to be sorted by source position (either ascending or descending is fine). It allows multiple runs to be specified per source position, by including those runs as adjacent elements in the history slice.

Note that if multiple runs are specified per source position, the returned position will only ever be between two source positions in the history, i.e. it holds that history[position-1].SourcePosition != history[position].SourcePosition (or position == 0).

This method assumes a uniform prior for all change point positions, including the no change point case. If we are to bias towards the no change point case, thresholding should be applied to relativeLikelihood before considering the change point real.

type SequenceLikelihood

type SequenceLikelihood struct {
	// contains filtered or unexported fields
}

SequenceLikelihood is used to compute the likelihood of observing a particular sequence of pass/fail results, assuming the failure rate p of the test remains constant over the sequence, and p is unknown but has the known prior Beta(alpha,beta) for some alpha and beta where alpha > 0 and beta > 0.

See go/bayesian-changepoint-estimation for details.

func NewSequenceLikelihood

func NewSequenceLikelihood(prior BetaDistribution) SequenceLikelihood

func (SequenceLikelihood) LogLikelihood

func (s SequenceLikelihood) LogLikelihood(x, n int) float64

LogLikelihood returns the log-likelihood of observing a particular sequence of pass/fail results (Y_0, Y_1 ... Y_{n-1}) of a known length. i.e. returns log(P(Y)).

See LogLikelihoodPartial for more details.

func (SequenceLikelihood) LogLikelihoodPartial

func (s SequenceLikelihood) LogLikelihoodPartial(x, n int, maxProportion float64) float64

LogLikelihoodPartial returns the log-likelihood of:

  • observing a particular binary sequence Y_0, Y_1 ... Y_{n-1}, knowing its length AND
  • concurrently, the underlying proportion p (= P(Y_i = 1)) that led to that sequence being less than or equal to maxProportion.

i.e. the method returns log(P(Y AND p <= maxProportion)).

The following assumptions are made:

  • Y ~ BernoilliProcess(p, n), i.e. Y is n independent samples of a bernoilli process with probability p of observing a 1.
  • p ~ Beta(a, b), i.e. p is unknown but taken from a Beta distribution with parameters a and b. If a = b = 1 is provided, the Beta distribution collapses to the uniform (uninformative) prior. In practice, where test failure rate is being modelled, a and b < 1 make more sense as tests tend towards being either consistently failing or consistently passing.

maxProportion must be between 0.0 and 1.0 (inclusive), and if it is 1.0, the method returns P(Y).

While the method returns the probability of observing a precise sequence Y, the only statistics needed to evaluate this probability is x (the number of elements that are true) and n (the total), so these are the only values accepted as arguments. (As such, inputs are also constrained as 0 <= x <= n.)

Log-likelihoods are returned, as for larger values of n (e.g. n > 100) the probabilities for any particular sequence can get very small. Working in log-probabilities avoids floating point precision issues.

[1] https://en.wikipedia.org/wiki/Beta_distribution.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL