Documentation
¶
Index ¶
- func CosineDistance(a, b []float32) float32
- func CreateSurface(fn DistanceFunc) vectortypes.Surface[vectortypes.F32]
- func EuclideanDistance(a, b []float32) float32
- func NodeSurface[K cmp.Ordered](fn DistanceFunc) vectortypes.Surface[Node[K]]
- func RegisterDistanceFunc(name string, fn DistanceFunc)
- func ToVectorTypesDistanceFunc(fn DistanceFunc) vectortypes.DistanceFunc
- type Analyzer
- type DistanceFunc
- type Graph
- func (g *Graph[K]) Add(nodes ...Node[K]) error
- func (g *Graph[K]) BatchAdd(nodes []Node[K]) error
- func (h *Graph[K]) BatchDelete(keys []K) []bool
- func (g *Graph[K]) BatchSearch(queries []Vector, k int) ([][]Node[K], error)
- func (g *Graph[K]) BatchSearchWithNegatives(queries []Vector, negatives [][]Vector, k int, negWeight float32) ([][]Node[K], error)
- func (h *Graph[K]) Delete(key K) bool
- func (g *Graph[K]) Dims() int
- func (h *Graph[K]) Export(w io.Writer) error
- func (h *Graph[K]) Import(r io.Reader) error
- func (h *Graph[K]) Len() int
- func (h *Graph[K]) Lookup(key K) (Vector, bool)
- func (h *Graph[K]) ParallelSearch(near Vector, k int, numWorkers int) ([]Node[K], error)
- func (h *Graph[K]) Search(near Vector, k int) ([]Node[K], error)
- func (h *Graph[K]) SearchWithNegative(near Vector, negative Vector, k int, negWeight float32) ([]Node[K], error)
- func (h *Graph[K]) SearchWithNegatives(near Vector, negatives []Vector, k int, negWeight float32) ([]Node[K], error)
- func (g *Graph[K]) Validate() error
- type GraphQualityMetrics
- type Node
- type SavedGraph
- type Vector
- type VectorDistance
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func CosineDistance ¶
CosineDistance computes the cosine distance between two vectors.
func CreateSurface ¶ added in v0.4.0
func CreateSurface(fn DistanceFunc) vectortypes.Surface[vectortypes.F32]
CreateSurface creates a vectortypes.Surface from a DistanceFunc
func EuclideanDistance ¶
EuclideanDistance computes the Euclidean distance between two vectors.
func NodeSurface ¶ added in v0.4.0
func NodeSurface[K cmp.Ordered](fn DistanceFunc) vectortypes.Surface[Node[K]]
NodeSurface creates a vectortypes.Surface for Node types
func RegisterDistanceFunc ¶
func RegisterDistanceFunc(name string, fn DistanceFunc)
RegisterDistanceFunc registers a distance function with a name. A distance function must be registered here before a graph can be exported and imported.
func ToVectorTypesDistanceFunc ¶ added in v0.4.0
func ToVectorTypesDistanceFunc(fn DistanceFunc) vectortypes.DistanceFunc
ToVectorTypesDistanceFunc converts a standard DistanceFunc to a vectortypes.DistanceFunc
Types ¶
type Analyzer ¶
Analyzer is a struct that holds a graph and provides methods for analyzing it. It offers no compatibility guarantee as the methods of measuring the graph's health with change with the implementation.
func (*Analyzer[T]) Connectivity ¶
Connectivity returns the average number of edges in the graph for each non-empty layer.
func (*Analyzer[T]) QualityMetrics ¶
func (a *Analyzer[T]) QualityMetrics() GraphQualityMetrics
QualityMetrics calculates various quality metrics for the graph. Returns a struct containing metrics that evaluate the graph's quality.
func (*Analyzer[T]) Topography ¶
Topography returns the number of nodes in each layer of the graph.
type DistanceFunc ¶
DistanceFunc is a function that computes the distance between two vectors.
type Graph ¶
type Graph[K cmp.Ordered] struct { // Distance is the distance function used to compare embeddings. Distance DistanceFunc // Rng is used for level generation. It may be set to a deterministic value // for reproducibility. Note that deterministic number generation can lead to // degenerate graphs when exposed to adversarial inputs. Rng *rand.Rand // M is the maximum number of neighbors to keep for each node. // A good default for OpenAI embeddings is 16. M int // Ml is the level generation factor. // E.g., for Ml = 0.25, each layer is 1/4 the size of the previous layer. Ml float64 // EfSearch is the number of nodes to consider in the search phase. // 20 is a reasonable default. Higher values improve search accuracy at // the expense of memory. EfSearch int // contains filtered or unexported fields }
Graph is a Hierarchical Navigable Small World graph. All public parameters must be set before adding nodes to the graph. K is cmp.Ordered instead of of comparable so that they can be sorted.
Parameter Tuning Guide:
M: The maximum number of connections per node.
- Higher values improve search accuracy but increase memory usage and build time.
- Lower values reduce memory usage but may decrease search accuracy.
- Recommended range: 8-64, with 16 being a good default for most use cases.
- For high-dimensional data (>1000 dimensions), higher M values (32-64) often work better.
- For lower-dimensional data (<100 dimensions), lower M values (8-16) are usually sufficient.
Ml: The level generation factor, controlling the graph hierarchy.
- Determines the probability of a node being promoted to higher layers.
- Lower values (e.g., 0.1) create more layers with fewer nodes in higher layers.
- Higher values (e.g., 0.5) create fewer layers with more nodes in higher layers.
- Recommended range: 0.1-0.5, with 0.25 being a good default.
- For very large graphs (>1M nodes), lower values (0.1-0.2) often work better.
EfSearch: The size of the dynamic candidate list during search.
- Higher values improve search accuracy but increase search time.
- Lower values speed up search but may decrease accuracy.
- Recommended range: 20-200, with 20-50 being good defaults.
- For applications requiring high recall, use higher values (100-200).
- For applications prioritizing speed, use lower values (20-50).
Distance: The distance function used to compare vectors.
- CosineDistance is recommended for normalized embeddings (e.g., OpenAI embeddings).
- EuclideanDistance is recommended for non-normalized embeddings.
- The choice of distance function should match the properties of your data.
Memory Usage: The memory overhead of a graph is approximately:
- Base layer: n * d * 4 bytes (for the vectors themselves)
- Graph structure: n * log(n) * sizeof(key) * M bytes
Where n is the number of vectors, d is the dimensionality, and sizeof(key) is the size of the key in bytes.
func NewGraph ¶
NewGraph returns a new graph with default parameters, roughly designed for storing OpenAI embeddings.
func NewGraphWithConfig ¶
func NewGraphWithConfig[K cmp.Ordered](m int, ml float64, efSearch int, distance DistanceFunc) (*Graph[K], error)
NewGraphWithConfig returns a new graph with the specified parameters. It validates the configuration and returns an error if any parameter is invalid.
func (*Graph[K]) Add ¶
Add inserts nodes into the graph. If another node with the same ID exists, it is replaced.
func (*Graph[K]) BatchAdd ¶
BatchAdd adds multiple nodes to the graph in a single operation. This is more efficient than calling Add multiple times when adding many nodes as it only acquires the lock once.
func (*Graph[K]) BatchDelete ¶
BatchDelete removes multiple nodes from the graph in a single operation. This is more efficient than calling Delete multiple times when deleting many nodes as it only acquires the lock once. Returns a slice of booleans indicating whether each key was successfully deleted.
func (*Graph[K]) BatchSearch ¶
BatchSearch performs multiple searches in a single operation. This is more efficient than calling Search multiple times when performing many searches as it only acquires the lock once.
func (*Graph[K]) BatchSearchWithNegatives ¶
func (g *Graph[K]) BatchSearchWithNegatives(queries []Vector, negatives [][]Vector, k int, negWeight float32) ([][]Node[K], error)
BatchSearchWithNegatives performs multiple searches with negative examples in a single operation. This is more efficient than calling SearchWithNegatives multiple times when performing many searches as it only acquires the lock once.
func (*Graph[K]) Delete ¶
Delete removes a node from the graph by key. It tries to preserve the clustering properties of the graph by replenishing connectivity in the affected neighborhoods.
func (*Graph[K]) Dims ¶
Dims returns the number of dimensions in the graph, or 0 if the graph is empty.
func (*Graph[K]) Import ¶
Import reads the graph from a reader. T must implement io.ReaderFrom. The imported graph does not have to match the exported graph's parameters (except for dimensionality). The graph will converge onto the new parameters.
func (*Graph[K]) ParallelSearch ¶
ParallelSearch finds the k nearest neighbors from the target node using parallel processing. It's optimized for large graphs and high-dimensional data. The numWorkers parameter controls the level of parallelism. If set to 0, it defaults to the number of CPU cores.
func (*Graph[K]) SearchWithNegative ¶
func (h *Graph[K]) SearchWithNegative(near Vector, negative Vector, k int, negWeight float32) ([]Node[K], error)
SearchWithNegative finds the k nearest neighbors from the target node while avoiding vectors similar to the negative example. The negWeight parameter controls the influence of the negative example (0.0 to 1.0, where higher values give more importance to avoiding the negative example).
func (*Graph[K]) SearchWithNegatives ¶
func (h *Graph[K]) SearchWithNegatives(near Vector, negatives []Vector, k int, negWeight float32) ([]Node[K], error)
SearchWithNegatives finds the k nearest neighbors from the target node while avoiding vectors similar to the negative examples. The negWeight parameter controls the influence of the negative examples (0.0 to 1.0, where higher values give more importance to avoiding the negative examples).
type GraphQualityMetrics ¶
type GraphQualityMetrics struct { // NodeCount is the total number of nodes in the graph. NodeCount int // AvgConnectivity is the average number of connections per node in the base layer. AvgConnectivity float64 // ConnectivityStdDev is the standard deviation of connections per node. ConnectivityStdDev float64 // DistortionRatio measures how well the graph preserves distances. // Lower values indicate better distance preservation. DistortionRatio float64 // LayerBalance measures how well balanced the layers are. // Values closer to 1.0 indicate better balance. LayerBalance float64 // GraphHeight is the number of layers in the graph. GraphHeight int }
GraphQualityMetrics contains various metrics that evaluate the quality of the graph.
type SavedGraph ¶
SavedGraph is a wrapper around a graph that persists changes to a file upon calls to Save. It is more convenient but less powerful than calling Graph.Export and Graph.Import directly.
func LoadSavedGraph ¶
func LoadSavedGraph[K cmp.Ordered](path string) (*SavedGraph[K], error)
LoadSavedGraph opens a graph from a file, reads it, and returns it.
If the file does not exist (i.e. this is a new graph), the equivalent of NewGraph is returned.
It does not hold open a file descriptor, so SavedGraph can be forgotten without ever calling Save.
func (*SavedGraph[K]) Save ¶
func (g *SavedGraph[K]) Save() error
Save writes the graph to the file.
type VectorDistance ¶ added in v0.4.0
type VectorDistance[T any] struct { Surface vectortypes.Surface[T] }
VectorDistance is a generic distance calculator that can work with any type that can be mapped to a vectortypes.F32
func NewNodeDistance ¶ added in v0.4.0
func NewNodeDistance[K cmp.Ordered](fn DistanceFunc) *VectorDistance[Node[K]]
NewNodeDistance creates a VectorDistance for Node types
func NewVectorDistance ¶ added in v0.4.0
func NewVectorDistance[T any](surface vectortypes.Surface[T]) *VectorDistance[T]
NewVectorDistance creates a new VectorDistance with the given surface
func (*VectorDistance[T]) Distance ¶ added in v0.4.0
func (vd *VectorDistance[T]) Distance(a, b T) float32
Distance calculates the distance between two vectors
Directories
¶
Path | Synopsis |
---|---|
examples
|
|
hnsw-extensions
|
|
facets
Package facets provides extensions to the HNSW library for faceted search capabilities.
|
Package facets provides extensions to the HNSW library for faceted search capabilities. |
facets/examples/product_search
Package examples provides example implementations of the hnsw-extensions.
|
Package examples provides example implementations of the hnsw-extensions. |
hybrid
Package hybrid provides hybrid index structures that combine HNSW with complementary indexing approaches to overcome limitations in specific scenarios.
|
Package hybrid provides hybrid index structures that combine HNSW with complementary indexing approaches to overcome limitations in specific scenarios. |
meta
Package meta provides extensions to the HNSW library for storing and retrieving JSON metadata alongside vectors.
|
Package meta provides extensions to the HNSW library for storing and retrieving JSON metadata alongside vectors. |
meta/example
Package example provides examples of using the metadata extension for HNSW.
|
Package example provides examples of using the metadata extension for HNSW. |
Package vector provides optimized vector operations for the HNSW library.
|
Package vector provides optimized vector operations for the HNSW library. |
Package vectortypes provides common types for vector operations
|
Package vectortypes provides common types for vector operations |