topk

package module
v1.1.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 27, 2024 License: MIT Imports: 8 Imported by: 1

README

topk

Coverage

Go Reference Go Report Card

Sliding-window and regular top-K sketches.

import (
	"github.com/keilerkonzept/topk" // plain sketch
	"github.com/keilerkonzept/topk/sliding" // sliding-window sketch
)

Demo application: top K requesting IPs within a sliding time window from a web server access logs dataset

Sliding Top-K Demo Application

Contents

Examples

Top-K Sketch
package main

import (
	"log"
	"github.com/keilerkonzept/topk"
)

func main() {
	// make a new sketch keeping track of k=3 items using 1024x3 = 3072 buckets.
	sketch := topk.New(3, topk.WithWidth(1024), topk.WithDepth(3))

	log.Println("the sketch takes up", sketch.SizeBytes(), "bytes in memory")

	sketch.Incr("an item")            // count "an item" 1 time
	sketch.Add("an item", 123)        // count "an item" 123 times
	sketch.Add("another item", 4)     // count "another item" 4 times
	sketch.Add("an item", 5)          // count "an item" 5 more times
	sketch.Add("yet another item", 6) // count "yet another item" 6 times

	if sketch.Query("an item") {
		// "an item" is in the top K items observed within the last 60 ticks
	}

	_ = sketch.Count("another item") // return the estimated count for "another item"

	// SortedSlice() returns the current top-K entries as a slice of {Fingerprint,Item,Count} structs.
	for _, entry := range sketch.SortedSlice() {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}

	// Iter is an interator over the (*not* sorted) current top-K entries.
	for entry := range sketch.Iter {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}
	sketch.Reset() // reset to New() state
}
Sliding-window Top-K Sketch
package main

import (
	"log"
	"github.com/keilerkonzept/topk/sliding"
)

func main() {
	// make a new sketch keeping track of k=3 items over a window of the last 60 ticks
	// use width=1024 x depth=3 = 3072 buckets
	sketch := sliding.New(3, 60, sliding.WithWidth(1024), sliding.WithDepth(3))

	log.Println("the sketch takes up", sketch.SizeBytes(), "bytes in memory")

	sketch.Incr("an item")            // count "an item" 1 time
	sketch.Add("an item", 123)        // count "an item" 123 times
	sketch.Tick()                     // advance time by one tick
	sketch.Add("another item", 4)     // count "another item" 4 times
	sketch.Ticks(2)                   // advance time by two ticks
	sketch.Add("an item", 5)          // count "an item" 5 more times
	sketch.Add("yet another item", 6) // count "yet another item" 6 times

	if sketch.Query("an item") {
		// "an item" is in the top K items observed within the last 60 ticks
	}

	_ = sketch.Count("another item") // return the estimated count for "another item"

	// SortedSlice() returns the current top-K entries as a slice of {Fingerprint,Item,Count} structs.
	for _, entry := range sketch.SortedSlice() {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}

	// Iter is an interator over the (*not* sorted) current top-K entries.
	for entry := range sketch.Iter {
		log.Println(entry.Item, "has been counted", entry.Count, "times")
	}
	sketch.Reset() // reset to New() state
}

Benchmarks

Top-K Sketch
goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk
cpu: Apple M1 Pro

The Add benchmark performs random increments in the interval [1,10).

Operation K Depth Width time bytes allocs
Add 10 3 1024 358.6 ns/op 0 B/op 0 allocs/op
Add 10 3 8192 375.0 ns/op 0 B/op 0 allocs/op
Add 10 4 1024 449.9 ns/op 0 B/op 0 allocs/op
Add 10 4 8192 436.0 ns/op 0 B/op 0 allocs/op
Add 100 3 1024 371.5 ns/op 0 B/op 0 allocs/op
Add 100 3 8192 387.9 ns/op 0 B/op 0 allocs/op
Add 100 4 1024 452.3 ns/op 0 B/op 0 allocs/op
Add 100 4 8192 471.4 ns/op 0 B/op 0 allocs/op
Incr 10 3 1024 257.2 ns/op 0 B/op 0 allocs/op
Incr 10 3 8192 232.3 ns/op 0 B/op 0 allocs/op
Incr 10 4 1024 249.1 ns/op 0 B/op 0 allocs/op
Incr 10 4 8192 251.2 ns/op 0 B/op 0 allocs/op
Incr 100 3 1024 264.2 ns/op 0 B/op 0 allocs/op
Incr 100 3 8192 227.4 ns/op 0 B/op 0 allocs/op
Incr 100 4 1024 267.1 ns/op 0 B/op 0 allocs/op
Incr 100 4 8192 261.3 ns/op 0 B/op 0 allocs/op
Count 10 3 1024 216.0 ns/op 0 B/op 0 allocs/op
Count 10 3 8192 215.4 ns/op 0 B/op 0 allocs/op
Count 10 4 1024 220.0 ns/op 0 B/op 0 allocs/op
Count 10 4 8192 269.3 ns/op 0 B/op 0 allocs/op
Count 100 3 1024 235.1 ns/op 0 B/op 0 allocs/op
Count 100 3 8192 277.1 ns/op 0 B/op 0 allocs/op
Count 100 4 1024 278.7 ns/op 0 B/op 0 allocs/op
Count 100 4 8192 302.2 ns/op 0 B/op 0 allocs/op
Query 10 3 1024 129.6 ns/op 0 B/op 0 allocs/op
Query 10 3 8192 98.21 ns/op 0 B/op 0 allocs/op
Query 10 4 1024 129.9 ns/op 0 B/op 0 allocs/op
Query 10 4 8192 114.3 ns/op 0 B/op 0 allocs/op
Query 100 3 1024 141.2 ns/op 0 B/op 0 allocs/op
Query 100 3 8192 140.8 ns/op 0 B/op 0 allocs/op
Query 100 4 1024 131.1 ns/op 0 B/op 0 allocs/op
Query 100 4 8192 109.8 ns/op 0 B/op 0 allocs/op
Sliding-Window Top-K Sketch
goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk/sliding
cpu: Apple M1 Pro

The Add benchmark performs random increments in the interval [1,10).

Operation K Depth Width Window size History size time bytes allocs
Add 10 3 1024 100 50 696.9 ns/op 0 B/op 0 allocs/op
Add 10 3 1024 100 100 1051 ns/op 0 B/op 0 allocs/op
Add 10 3 8192 100 50 784.9 ns/op 0 B/op 0 allocs/op
Add 10 3 8192 100 100 1146 ns/op 0 B/op 0 allocs/op
Add 100 3 1024 100 50 712.9 ns/op 0 B/op 0 allocs/op
Add 100 3 1024 100 100 1054 ns/op 0 B/op 0 allocs/op
Add 100 3 8192 100 50 763.3 ns/op 0 B/op 0 allocs/op
Add 100 3 8192 100 100 1139 ns/op 0 B/op 0 allocs/op
Incr 10 3 1024 100 50 434.9 ns/op 0 B/op 0 allocs/op
Incr 10 3 1024 100 100 560.7 ns/op 0 B/op 0 allocs/op
Incr 10 3 8192 100 50 501.1 ns/op 0 B/op 0 allocs/op
Incr 10 3 8192 100 100 728.7 ns/op 0 B/op 0 allocs/op
Incr 100 3 1024 100 50 425.6 ns/op 0 B/op 0 allocs/op
Incr 100 3 1024 100 100 580.0 ns/op 0 B/op 0 allocs/op
Incr 100 3 8192 100 50 497.8 ns/op 0 B/op 0 allocs/op
Incr 100 3 8192 100 100 746.2 ns/op 0 B/op 0 allocs/op
Count 10 3 1024 100 50 228.5 ns/op 0 B/op 0 allocs/op
Count 10 3 1024 100 100 209.3 ns/op 0 B/op 0 allocs/op
Count 10 3 8192 100 50 234.5 ns/op 0 B/op 0 allocs/op
Count 10 3 8192 100 100 230.7 ns/op 0 B/op 0 allocs/op
Count 100 3 1024 100 50 237.5 ns/op 0 B/op 0 allocs/op
Count 100 3 1024 100 100 242.8 ns/op 0 B/op 0 allocs/op
Count 100 3 8192 100 50 246.5 ns/op 0 B/op 0 allocs/op
Count 100 3 8192 100 100 243.4 ns/op 0 B/op 0 allocs/op
Query 10 3 1024 100 50 101.7 ns/op 0 B/op 0 allocs/op
Query 10 3 1024 100 100 104.8 ns/op 0 B/op 0 allocs/op
Query 10 3 8192 100 50 114.0 ns/op 0 B/op 0 allocs/op
Query 10 3 8192 100 100 114.5 ns/op 0 B/op 0 allocs/op
Query 100 3 1024 100 50 135.9 ns/op 0 B/op 0 allocs/op
Query 100 3 1024 100 100 118.5 ns/op 0 B/op 0 allocs/op
Query 100 3 8192 100 50 130.1 ns/op 0 B/op 0 allocs/op
Query 100 3 8192 100 100 131.5 ns/op 0 B/op 0 allocs/op
Tick 10 3 1024 100 50 4191 ns/op 0 B/op 0 allocs/op
Tick 10 3 1024 100 100 7010 ns/op 0 B/op 0 allocs/op
Tick 10 3 8192 100 50 28699 ns/op 0 B/op 0 allocs/op
Tick 10 3 8192 100 100 90979 ns/op 0 B/op 0 allocs/op
Tick 100 3 1024 100 50 6539 ns/op 0 B/op 0 allocs/op
Tick 100 3 1024 100 100 9343 ns/op 0 B/op 0 allocs/op
Tick 100 3 8192 100 50 31349 ns/op 0 B/op 0 allocs/op
Tick 100 3 8192 100 100 87488 ns/op 0 B/op 0 allocs/op
Comparison with segmentio/topk

Using benchstat:

$ go test -run='^$' -bench=BenchmarkSketchAddForComparison -count=10 | tee new.txt
$ go test -run='^$' -bench=BenchmarkSegmentioTopkSample -count=10 | tee old.txt
$ benchstat -row /K,/Depth,/Width,/Decay -col .name old.txt new.txt
goos: darwin
goarch: arm64
pkg: github.com/keilerkonzept/topk
cpu: Apple M1 Pro
K Depth Width Decay segmentio/topk (sec/op) this package (sec/op) diff
10 3 256 0.6 641.0n ± 1% 373.5n ± 3% -41.73% (p=0.000 n=10)
10 3 256 0.8 602.6n ± 1% 387.3n ± 2% -35.73% (p=0.000 n=10)
10 3 256 0.9 550.4n ± 4% 431.3n ± 2% -21.63% (p=0.000 n=10)
100 4 460 0.6 763.8n ± 2% 427.0n ± 1% -44.09% (p=0.000 n=10)
100 4 460 0.8 720.9n ± 2% 459.1n ± 4% -36.30% (p=0.000 n=10)
100 4 460 0.9 660.6n ± 3% 539.0n ± 22% -18.41% (p=0.005 n=10)
1000 6 6907 0.6 1107.0n ± 2% 555.9n ± 8% -49.79% (p=0.000 n=10)
1000 6 6907 0.8 1040.0n ± 4% 613.4n ± 2% -41.02% (p=0.000 n=10)
1000 6 6907 0.9 936.5n ± 1% 731.5n ± 2% -21.89% (p=0.000 n=10)
10000 9 92103 0.6 10.693µ ± 2% 1.058µ ± 2% -90.11% (p=0.000 n=10)
10000 9 92103 0.8 10.667µ ± 1% 1.182µ ± 6% -88.92% (p=0.000 n=10)
10000 9 92103 0.9 10.724µ ± 1% 1.288µ ± 2% -87.98% (p=0.000 n=10)
100000 11 1151292 0.6 89.385µ ± 0% 1.674µ ± 1% -98.13% (p=0.000 n=10)
100000 11 1151292 0.8 89.349µ ± 1% 1.708µ ± 1% -98.09% (p=0.000 n=10)
100000 11 1151292 0.9 89.284µ ± 1% 1.705µ ± 1% -98.09% (p=0.000 n=10)

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func BucketIndex

func BucketIndex(item string, row, width int) int

BucketIndex returns the counter bucket index for an item in the given row of the sketch.

func Fingerprint

func Fingerprint(item string) uint32

Fingerprint returns an item's fingerprint.

Types

type Bucket

type Bucket struct {
	Fingerprint uint32
	Count       uint32
}

Bucket is a single sketch counter together with the corresponding item's fingerprint.

type Option

type Option func(*Sketch)

func WithDecay

func WithDecay(decay float32) Option

WithDecay sets the counter decay probability on collisions.

func WithDecayLUTSize

func WithDecayLUTSize(n int) Option

WithDecayLUTSize sets the decay look-up table size.

func WithDepth

func WithDepth(depth int) Option

WithDepth sets the depth (number of hash functions) of a sketch.

func WithWidth

func WithWidth(width int) Option

WithWidth sets the width (number of counters per hash function) of a sketch.

type Sketch

type Sketch struct {
	K     int // Keep track of top `K` items in the min-heap..
	Width int // Number of buckets per hash function.
	Depth int // Number of hash functions.

	// `math.Pow(Decay, i)` is the probability that a flow's counter with value `i` is decremented on collision.
	Decay float32
	// Look-up table for powers of `Decay`. The value at `i` is `math.Pow(Decay, i)`
	DecayLUT []float32

	Buckets []Bucket  // Sketch counters.
	Heap    *heap.Min // Top-K min-heap.
}

Sketch is a top-k sketch. The entire structure is serializable using any serialization method - all fields and sub-structs are exported and can be reasonably serialized.

func New

func New(k int, opts ...Option) *Sketch

New returns a sliding top-k sketch with the given `k` (number of top items to keep) and `windowSize` (in ticks).`

  • The depth defaults to `max(3, log(k))` unless the WithDepth option is set.
  • The width defaults to `max(256, k*log(k))` unless the WithWidth option is set.
  • The decay parameter defaults to 0.9 unless the WithDecay option is set.
  • The decay LUT size defaults to 256 unless the WithDecayLUTSize option is set.

func (*Sketch) Add

func (me *Sketch) Add(item string, increment uint32) bool

Add increments the given item's count by the given increment. Returns whether the item is in the top K.

func (*Sketch) Count

func (me *Sketch) Count(item string) uint32

Count returns the estimated count of the given item.

func (*Sketch) Incr

func (me *Sketch) Incr(item string) bool

Incr counts a single instance of the given item.

func (*Sketch) Iter

func (me *Sketch) Iter(yield func(*heap.Item) bool)

Iter iterates over the top K items.

func (*Sketch) Query

func (me *Sketch) Query(item string) bool

Query returns whether the given item is in the top K items by count.

func (*Sketch) Reset

func (me *Sketch) Reset()

Reset resets the sketch to an empty state.

func (*Sketch) SizeBytes

func (me *Sketch) SizeBytes() int

SizeBytes returns the current size of the sketch in bytes.

func (*Sketch) SortedSlice

func (me *Sketch) SortedSlice() []heap.Item

SortedSlice returns the top K items as a sorted slice.

Directories

Path Synopsis
Package heap implements a min-heap that keeps track of the top-K items in a sketch.
Package heap implements a min-heap that keeps track of the top-K items in a sketch.
internal
Package sliding implements a sliding-window HeavyKeeper, as described in "A Sketch Framework for Approximate Data Stream Processing in Sliding Windows" [1]
Package sliding implements a sliding-window HeavyKeeper, as described in "A Sketch Framework for Approximate Data Stream Processing in Sliding Windows" [1]

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL