massv2

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 7, 2025 License: MIT Imports: 5 Imported by: 0

README

Mueen's Algorithm for Similarity Search (MASS)

This module implements version 2 of Mueen's Algorithm for Similarity Search (MASS_V2). MASS is an algorithm to create Distance Profile of a query to a long time series: given a query subsequence Q of length m and a long time-series T of length n, MASS computes the z-normalized Euclidean distance between Q and every subsequence in T of length m.

The key idea underlying MASS is that z-normalized Euclidean distance reduces to a formula involving sliding dot products of Q against subsequences of T plus mean and variance terms. Sliding dot products can be computed in O(n log n) using fast convolution via FFT (Fast Fourier Transform). Consequently, MASS_V2 computes the entire Distance Profile in O(n log n), independent of data distribution.

Additionally, this module provides convenience functions to find the best match, or the top K matches, in the time-series.

Citation

Abdullah Mueen, Sheng Zhong, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Kumar Gupta and Eamonn Keogh (2022), "The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance."

URL: http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html

Algorithm Complexity

  • Time: O(n log n)
    • Dominated by FFT operations
  • Space: O(n)
    • Requires memory for the entire time-series, plus FFT working space

Documentation

Overview

Package massv2 implements version 2 of Mueen's Algorithm for Similarity Search (MASS_V2). MASS is an algorithm to create Distance Profile of a query to a long time series: given a query subsequence Q of length m and a long time-series T of length n, MASS computes the z-normalized Euclidean distance between Q and every subsequence in T of length m. Additionally, this package provides convenience functions to find the best match, or the top K matches, in the time-series.

Citation: Abdullah Mueen, Sheng Zhong, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Kumar Gupta and Eamonn Keogh (2022), The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance, URL: http://www.cs.unm.edu/~mueen/FastestSimilaritySearch.html

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrEmptyQuery                = errors.New("empty or nil query")
	ErrEmptyTimeSeries           = errors.New("empty or nil time-series")
	ErrQueryHasZeroVariance      = errors.New("query has zero variance (all values are the same)")
	ErrQueryLongerThanTimeSeries = errors.New("query length exceeds time-series length")
)
View Source
var (
	ErrKMustBePositive = errors.New("k must be a positive integer")
)

Functions

func FindBestMatch

func FindBestMatch(timeSeries, query []float64) (idx int, dist float64, err error)

FindBestMatch uses MASSV2 to identify the starting index of the subsequence in the time-series that best matches the query. FindBestMatch returns the index of the best match and its z-normalized Euclidean Distance from the query.

When MASSV2 returns an error while processing timeSeries and query, FindBestMatch returns that error.

func FindTopKMatches

func FindTopKMatches(
	timeSeries,
	query []float64,
	k int,
) (indices []int, dists []float64, err error)

FindTopKMatches uses MASSV2 to identify the k subsequences in the time-series that best match the query. FindTopKMatches returns the indices of the top k matches and their z-normalized Euclidean Distances from the query.

When MASSV2 returns an error while processing timeSeries and query, FindTopKMatches returns that error.

func MASSV2

func MASSV2(timeSeries, query []float64) (distances []float64, err error)

MASSV2 computes and returns the z-normalized Euclidean distance between the query sequence and every subsequence of the same length in the time-series. The algorithm operates in O(n log n) time with a space complexity of O(n).

The time-series and query sequence must not be empty or nil. The query cannot have zero variance (constant values) or be longer than the time-series. An error is returned when the inputs are invalid.

Types

This section is empty.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL