topk

package module

v1.1.4 Latest Latest Go to latest Published: Sep 27, 2024 License: MIT Imports: 8 Imported by: 1

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/keilerkonzept/topk

Links

Open Source Insights

README ¶

topk

Sliding-window and regular top-K sketches.

A fast implementation of the HeavyKeeper top-K sketch inspired by the segmentio implementation and RedisBloom implementation. Significantly faster (~1.5x) than segmentio/topk on small sketches (k <= 1000) and much faster (10x-90x) on large sketches (k >= 10000).
A sliding-window top-K sketch, also based on HeavyKeeper, as described in "A Sketch Framework for Approximate Data Stream Processing in Sliding Windows"

import (
	"github.com/keilerkonzept/topk" // plain sketch
	"github.com/keilerkonzept/topk/sliding" // sliding-window sketch
)

Demo application: top K requesting IPs within a sliding time window from a web server access logs dataset

Sliding Top-K Demo Application

Examples

Top-K Sketch

package main

import (
	"log"
	"github.com/keilerkonzept/topk"
)

func main() {
	// make a new sketch keeping track of k=3 items using 1024x3 = 3072 buckets.
	sketch := topk.New(3, topk.WithWidth(1024), topk.WithDepth(3))

	log.Println("the sketch takes up", sketch.SizeBytes(), "bytes in memory")

	sketch.Incr("an item")            // count "an item" 1 time
	sketch.Add("an item", 123)        // count "an item" 123 times
	sketch.Add("another item", 4)     // count "another item" 4 times
	sketch.Add("an item", 5)          // count "an item" 5 more times
	sketch.Add("yet another item", 6) // count "yet another item" 6 times

	if sketch.Query("an item") {
		// "an item" is in the top K items observed within the last 60 ticks
	}

	_ = sketch.Count("another item") // return the estimated count for "another item"

	// SortedSlice() returns the current top-K entries as a slice of {Fingerprint,Item,Count} structs.
	for _, entry := range sketch.SortedSlice() {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}

	// Iter is an interator over the (*not* sorted) current top-K entries.
	for entry := range sketch.Iter {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}
	sketch.Reset() // reset to New() state
}

Sliding-window Top-K Sketch

package main

import (
	"log"
	"github.com/keilerkonzept/topk/sliding"
)

func main() {
	// make a new sketch keeping track of k=3 items over a window of the last 60 ticks
	// use width=1024 x depth=3 = 3072 buckets
	sketch := sliding.New(3, 60, sliding.WithWidth(1024), sliding.WithDepth(3))

	log.Println("the sketch takes up", sketch.SizeBytes(), "bytes in memory")

	sketch.Incr("an item")            // count "an item" 1 time
	sketch.Add("an item", 123)        // count "an item" 123 times
	sketch.Tick()                     // advance time by one tick
	sketch.Add("another item", 4)     // count "another item" 4 times
	sketch.Ticks(2)                   // advance time by two ticks
	sketch.Add("an item", 5)          // count "an item" 5 more times
	sketch.Add("yet another item", 6) // count "yet another item" 6 times

	if sketch.Query("an item") {
		// "an item" is in the top K items observed within the last 60 ticks
	}

	_ = sketch.Count("another item") // return the estimated count for "another item"

	// SortedSlice() returns the current top-K entries as a slice of {Fingerprint,Item,Count} structs.
	for _, entry := range sketch.SortedSlice() {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}

	// Iter is an interator over the (*not* sorted) current top-K entries.
	for entry := range sketch.Iter {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}
	sketch.Reset() // reset to New() state
}

Benchmarks

Top-K Sketch

goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk
cpu: Apple M1 Pro

The Add benchmark performs random increments in the interval [1,10).

Operation	K	Depth	Width	time	allocs
`Add`	10	3	1024	358.6 ns/op	0 allocs/op
`Add`	10	3	8192	375.0 ns/op	0 allocs/op
`Add`	10	4	1024	449.9 ns/op	0 allocs/op
`Add`	10	4	8192	436.0 ns/op	0 allocs/op
`Add`	100	3	1024	371.5 ns/op	0 allocs/op
`Add`	100	3	8192	387.9 ns/op	0 allocs/op
`Add`	100	4	1024	452.3 ns/op	0 allocs/op
`Add`	100	4	8192	471.4 ns/op	0 allocs/op
`Incr`	10	3	1024	257.2 ns/op	0 allocs/op
`Incr`	10	3	8192	232.3 ns/op	0 allocs/op
`Incr`	10	4	1024	249.1 ns/op	0 allocs/op
`Incr`	10	4	8192	251.2 ns/op	0 allocs/op
`Incr`	100	3	1024	264.2 ns/op	0 allocs/op
`Incr`	100	3	8192	227.4 ns/op	0 allocs/op
`Incr`	100	4	1024	267.1 ns/op	0 allocs/op
`Incr`	100	4	8192	261.3 ns/op	0 allocs/op
`Count`	10	3	1024	216.0 ns/op	0 allocs/op
`Count`	10	3	8192	215.4 ns/op	0 allocs/op
`Count`	10	4	1024	220.0 ns/op	0 allocs/op
`Count`	10	4	8192	269.3 ns/op	0 allocs/op
`Count`	100	3	1024	235.1 ns/op	0 allocs/op
`Count`	100	3	8192	277.1 ns/op	0 allocs/op
`Count`	100	4	1024	278.7 ns/op	0 allocs/op
`Count`	100	4	8192	302.2 ns/op	0 allocs/op
`Query`	10	3	1024	129.6 ns/op	0 allocs/op
`Query`	10	3	8192	98.21 ns/op	0 allocs/op
`Query`	10	4	1024	129.9 ns/op	0 allocs/op
`Query`	10	4	8192	114.3 ns/op	0 allocs/op
`Query`	100	3	1024	141.2 ns/op	0 allocs/op
`Query`	100	3	8192	140.8 ns/op	0 allocs/op
`Query`	100	4	1024	131.1 ns/op	0 allocs/op
`Query`	100	4	8192	109.8 ns/op	0 allocs/op

Sliding-Window Top-K Sketch

goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk/sliding
cpu: Apple M1 Pro

The Add benchmark performs random increments in the interval [1,10).

Operation	K	Depth	Width	Window size	History size	time	allocs
`Add`	10	3	1024	100	50	696.9 ns/op	0 allocs/op
`Add`	10	3	1024	100	100	1051 ns/op	0 allocs/op
`Add`	10	3	8192	100	50	784.9 ns/op	0 allocs/op
`Add`	10	3	8192	100	100	1146 ns/op	0 allocs/op
`Add`	100	3	1024	100	50	712.9 ns/op	0 allocs/op
`Add`	100	3	1024	100	100	1054 ns/op	0 allocs/op
`Add`	100	3	8192	100	50	763.3 ns/op	0 allocs/op
`Add`	100	3	8192	100	100	1139 ns/op	0 allocs/op
`Incr`	10	3	1024	100	50	434.9 ns/op	0 allocs/op
`Incr`	10	3	1024	100	100	560.7 ns/op	0 allocs/op
`Incr`	10	3	8192	100	50	501.1 ns/op	0 allocs/op
`Incr`	10	3	8192	100	100	728.7 ns/op	0 allocs/op
`Incr`	100	3	1024	100	50	425.6 ns/op	0 allocs/op
`Incr`	100	3	1024	100	100	580.0 ns/op	0 allocs/op
`Incr`	100	3	8192	100	50	497.8 ns/op	0 allocs/op
`Incr`	100	3	8192	100	100	746.2 ns/op	0 allocs/op
`Count`	10	3	1024	100	50	228.5 ns/op	0 allocs/op
`Count`	10	3	1024	100	100	209.3 ns/op	0 allocs/op
`Count`	10	3	8192	100	50	234.5 ns/op	0 allocs/op
`Count`	10	3	8192	100	100	230.7 ns/op	0 allocs/op
`Count`	100	3	1024	100	50	237.5 ns/op	0 allocs/op
`Count`	100	3	1024	100	100	242.8 ns/op	0 allocs/op
`Count`	100	3	8192	100	50	246.5 ns/op	0 allocs/op
`Count`	100	3	8192	100	100	243.4 ns/op	0 allocs/op
`Query`	10	3	1024	100	50	101.7 ns/op	0 allocs/op
`Query`	10	3	1024	100	100	104.8 ns/op	0 allocs/op
`Query`	10	3	8192	100	50	114.0 ns/op	0 allocs/op
`Query`	10	3	8192	100	100	114.5 ns/op	0 allocs/op
`Query`	100	3	1024	100	50	135.9 ns/op	0 allocs/op
`Query`	100	3	1024	100	100	118.5 ns/op	0 allocs/op
`Query`	100	3	8192	100	50	130.1 ns/op	0 allocs/op
`Query`	100	3	8192	100	100	131.5 ns/op	0 allocs/op
`Tick`	10	3	1024	100	50	4191 ns/op	0 allocs/op
`Tick`	10	3	1024	100	100	7010 ns/op	0 allocs/op
`Tick`	10	3	8192	100	50	28699 ns/op	0 allocs/op
`Tick`	10	3	8192	100	100	90979 ns/op	0 allocs/op
`Tick`	100	3	1024	100	50	6539 ns/op	0 allocs/op
`Tick`	100	3	1024	100	100	9343 ns/op	0 allocs/op
`Tick`	100	3	8192	100	50	31349 ns/op	0 allocs/op
`Tick`	100	3	8192	100	100	87488 ns/op	0 allocs/op

Comparison with segmentio/topk

Using benchstat:

$ go test -run='^$' -bench=BenchmarkSketchAddForComparison -count=10 | tee new.txt
$ go test -run='^$' -bench=BenchmarkSegmentioTopkSample -count=10 | tee old.txt
$ benchstat -row /K,/Depth,/Width,/Decay -col .name old.txt new.txt

goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk
cpu: Apple M1 Pro

K	Depth	Width	Decay	`segmentio/topk` (sec/op)	this package (sec/op)	diff
10	3	256	0.6	641.0n ± 1%	373.5n ± 3%	-41.73% (p=0.000 n=10)
10	3	256	0.8	602.6n ± 1%	387.3n ± 2%	-35.73% (p=0.000 n=10)
10	3	256	0.9	550.4n ± 4%	431.3n ± 2%	-21.63% (p=0.000 n=10)
100	4	460	0.6	763.8n ± 2%	427.0n ± 1%	-44.09% (p=0.000 n=10)
100	4	460	0.8	720.9n ± 2%	459.1n ± 4%	-36.30% (p=0.000 n=10)
100	4	460	0.9	660.6n ± 3%	539.0n ± 22%	-18.41% (p=0.005 n=10)
1000	6	6907	0.6	1107.0n ± 2%	555.9n ± 8%	-49.79% (p=0.000 n=10)
1000	6	6907	0.8	1040.0n ± 4%	613.4n ± 2%	-41.02% (p=0.000 n=10)
1000	6	6907	0.9	936.5n ± 1%	731.5n ± 2%	-21.89% (p=0.000 n=10)
10000	9	92103	0.6	10.693µ ± 2%	1.058µ ± 2%	-90.11% (p=0.000 n=10)
10000	9	92103	0.8	10.667µ ± 1%	1.182µ ± 6%	-88.92% (p=0.000 n=10)
10000	9	92103	0.9	10.724µ ± 1%	1.288µ ± 2%	-87.98% (p=0.000 n=10)
100000	11	1151292	0.6	89.385µ ± 0%	1.674µ ± 1%	-98.13% (p=0.000 n=10)
100000	11	1151292	0.8	89.349µ ± 1%	1.708µ ± 1%	-98.09% (p=0.000 n=10)
100000	11	1151292	0.9	89.284µ ± 1%	1.705µ ± 1%	-98.09% (p=0.000 n=10)

Documentation ¶

Index ¶

func BucketIndex(item string, row, width int) int
func Fingerprint(item string) uint32
type Bucket
type Option
type Sketch
- func New(k int, opts ...Option) *Sketch

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func BucketIndex ¶

func BucketIndex(item string, row, width int) int

BucketIndex returns the counter bucket index for an item in the given row of the sketch.

func Fingerprint ¶

func Fingerprint(item string) uint32

Fingerprint returns an item's fingerprint.

Types ¶

type Bucket ¶

type Bucket struct {
	Fingerprint uint32
	Count       uint32
}

Bucket is a single sketch counter together with the corresponding item's fingerprint.

type Option ¶

type Option func(*Sketch)

func WithDecay ¶

func WithDecay(decay float32) Option

WithDecay sets the counter decay probability on collisions.

func WithDecayLUTSize ¶

func WithDecayLUTSize(n int) Option

WithDecayLUTSize sets the decay look-up table size.

func WithDepth ¶

func WithDepth(depth int) Option

WithDepth sets the depth (number of hash functions) of a sketch.

func WithWidth ¶

func WithWidth(width int) Option

WithWidth sets the width (number of counters per hash function) of a sketch.

type Sketch ¶

type Sketch struct {
	K     int // Keep track of top `K` items in the min-heap..
	Width int // Number of buckets per hash function.
	Depth int // Number of hash functions.

	// `math.Pow(Decay, i)` is the probability that a flow's counter with value `i` is decremented on collision.
	Decay float32
	// Look-up table for powers of `Decay`. The value at `i` is `math.Pow(Decay, i)`
	DecayLUT []float32

	Buckets []Bucket  // Sketch counters.
	Heap    *heap.Min // Top-K min-heap.
}

Sketch is a top-k sketch. The entire structure is serializable using any serialization method - all fields and sub-structs are exported and can be reasonably serialized.

func New ¶

func New(k int, opts ...Option) *Sketch

New returns a sliding top-k sketch with the given `k` (number of top items to keep) and `windowSize` (in ticks).`

The depth defaults to `max(3, log(k))` unless the WithDepth option is set.
The width defaults to `max(256, k*log(k))` unless the WithWidth option is set.
The decay parameter defaults to 0.9 unless the WithDecay option is set.
The decay LUT size defaults to 256 unless the WithDecayLUTSize option is set.

func (*Sketch) Add ¶

func (me *Sketch) Add(item string, increment uint32) bool

Add increments the given item's count by the given increment. Returns whether the item is in the top K.

func (*Sketch) Count ¶

func (me *Sketch) Count(item string) uint32

Count returns the estimated count of the given item.

func (*Sketch) Incr ¶

func (me *Sketch) Incr(item string) bool

Incr counts a single instance of the given item.

func (*Sketch) Iter ¶

func (me *Sketch) Iter(yield func(*heap.Item) bool)

Iter iterates over the top K items.

func (*Sketch) Query ¶

func (me *Sketch) Query(item string) bool

Query returns whether the given item is in the top K items by count.

func (*Sketch) Reset ¶

func (me *Sketch) Reset()

Reset resets the sketch to an empty state.

func (*Sketch) SizeBytes ¶

func (me *Sketch) SizeBytes() int

SizeBytes returns the current size of the sketch in bytes.

func (*Sketch) SortedSlice ¶

func (me *Sketch) SortedSlice() []heap.Item

SortedSlice returns the top K items as a sorted slice.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
heap Package heap implements a min-heap that keeps track of the top-K items in a sketch.	Package heap implements a min-heap that keeps track of the top-K items in a sketch.
internal
sizeof
sliding Package sliding implements a sliding-window HeavyKeeper, as described in "A Sketch Framework for Approximate Data Stream Processing in Sliding Windows" [1]	Package sliding implements a sliding-window HeavyKeeper, as described in "A Sketch Framework for Approximate Data Stream Processing in Sliding Windows" [1]

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

topk

Contents

Examples

Top-K Sketch

Sliding-window Top-K Sketch

Benchmarks

Top-K Sketch

Sliding-Window Top-K Sketch

Comparison with segmentio/topk

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func BucketIndex ¶

func Fingerprint ¶

Types ¶

type Bucket ¶

type Option ¶

func WithDecay ¶

func WithDecayLUTSize ¶

func WithDepth ¶

func WithWidth ¶

type Sketch ¶

func New ¶

func (*Sketch) Add ¶

func (*Sketch) Count ¶

func (*Sketch) Incr ¶

func (*Sketch) Iter ¶

func (*Sketch) Query ¶

func (*Sketch) Reset ¶

func (*Sketch) SizeBytes ¶

func (*Sketch) SortedSlice ¶

Source Files ¶

Directories ¶