Documentation
¶
Overview ¶
Package storage defines the storage abstractions needed for Gaby: DB, a basic key-value store, and VectorDB, a vector database. The storage needs are intentionally minimal (avoiding, for example, a requirement on SQL), to admit as many implementations as possible.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func Fmt ¶
Fmt formats data for printing, first trying ordered.DecodeFmt in case data is an [ordered encoding], then trying a backquoted string if possible (handling simple JSON data), and finally resorting to strconv.QuoteToASCII.
func JSON ¶
JSON converts x to JSON and returns the result. It panics if there is any error converting x to JSON. Since whether x can be converted to JSON depends almost entirely on its type, a marshaling error indicates a bug at the call site.
(The exception is certain malformed UTF-8 and floating-point infinity and NaN. Code must be careful not to use JSON with those.)
func Panic ¶
Panic panics with the text formatting of its arguments. It is meant to be called for database errors or corruption, which have been defined to be impossible. (See the DB documentation.)
Panic is expected to be used by DB implementations. DB clients should use the [DB.Panic] method instead.
func TestVectorDB ¶
Types ¶
type Batch ¶
type Batch interface { // Delete deletes any value associated with key. // Delete of an unset key is a no-op. Delete(key []byte) // DeleteRange deletes all key-value pairs with start ≤ key ≤ end. DeleteRange(start, end []byte) // Set sets the value associated with key to val. Set(key, val []byte) // MaybeApply calls Apply if the batch is getting close to full. // Every Batch has a limit to how many operations can be batched, // so in a bulk operation where atomicity of the entire batch is not a concern, // calling MaybeApply gives the Batch implementation // permission to flush the batch at specific “safe points”. // A typical limit for a batch is about 100MB worth of logged operations. // MaybeApply reports whether it called Apply. MaybeApply() bool // Apply applies all the batched operations to the underlying DB // as a single atomic unit. // When Apply returns, the Batch is an empty batch ready for // more operations. Apply() }
A Batch accumulates database mutations that are applied to a DB as a single atomic operation. Applying bulk operations in a batch is also more efficient than making individual DB method calls. The batched operations apply in the order they are made. For example, Set("a", "b") followed by Delete("a") is the same as Delete("a"), while Delete("a") followed by Set("a", "b") is the same as Set("a", "b").
type DB ¶
type DB interface { // Lock acquires a lock on the given name, which need not exist in the database. // After a successful Lock(name), // any other call to Lock(name) from any other client of the database // (including in another process, for shared databases) // must block until Unlock(name) has been called. // In a shared database, a lock may also unlock // when the client disconnects or times out. Lock(name string) // Unlock releases the lock with the given name, // which the caller must have locked. Unlock(name string) // Set sets the value associated with key to val. Set(key, val []byte) // Get looks up the value associated with key. // If there is no entry for key in the database, Get returns nil, false. // Otherwise it returns val, true. Get(key []byte) (val []byte, ok bool) // Scan returns an iterator over all key-value pairs with start ≤ key ≤ end. // The second value in each iteration pair is a function returning the value, // not the value itself: // // for key, getVal := range db.Scan([]byte("aaa"), []byte("zzz")) { // val := getVal() // fmt.Printf("%q: %q\n", key, val) // } // // In iterations that only need the keys or only need the values for a subset of keys, // some DB implementations may avoid work when the value function is not called. Scan(start, end []byte) iter.Seq2[[]byte, func() []byte] // Delete deletes any value associated with key. // Delete of an unset key is a no-op. Delete(key []byte) // DeleteRange deletes all key-value pairs with start ≤ key ≤ end. DeleteRange(start, end []byte) // Batch returns a new [Batch] that accumulates database mutations // to apply in an atomic operation. In addition to the atomicity, using a // Batch for bulk operations is more efficient than making each // change using repeated calls to DB's Set, Delete, and DeleteRange methods. Batch() Batch // Flush flushes DB changes to permanent storage. // Flush must be called before the process crashes or exits, // or else any changes since the previous Flush may be lost. Flush() // Close closes the database. // Like the other routines, it panics if an error happens, // so there is no error result. Close() // Panic logs the error message and args using the database's slog.Logger // and then panics with the text formatting of its arguments. // It is meant to be called when database corruption or other // database-related “can't happen” conditions been detected. Panic(msg string, args ...any) }
A DB is a key-value database.
DB operations are assumed not to fail. They panic, intending to take down the program, if there is an error accessing the database. The assumption is that the program cannot possibly continue without the database, since that's where all the state is stored. Similarly, clients of DB conventionally panic if the database returned corrupted data. Code using multiple parallel database operations can recover at the outermost calls. Clients of DB
type MemLocker ¶
type MemLocker struct {
// contains filtered or unexported fields
}
A MemLocker is an single-process implementation of the database Lock and Unlock methods, suitable if there is only one process accessing the database at a time.
type VectorBatch ¶
type VectorBatch interface { // Set sets the vector associated with the given document ID to vec. Set(id string, vec llm.Vector) // MaybeApply calls Apply if the VectorBatch is getting close to full. // Every VectorBatch has a limit to how many operations can be batched, // so in a bulk operation where atomicity of the entire batch is not a concern, // calling MaybeApply gives the VectorBatch implementation // permission to flush the batch at specific “safe points”. // A typical limit for a batch is about 100MB worth of logged operations. // // MaybeApply reports whether it called Apply. MaybeApply() bool // Apply applies all the batched operations to the underlying VectorDB // as a single atomic unit. // When Apply returns, the VectorBatch is an empty batch ready for // more operations. Apply() }
A VectorBatch accumulates vector database mutations that are applied to a VectorDB in a single atomic operation. Applying bulk operations in a batch is also more efficient than making individual VectorDB method calls. The batched operations apply in the order they are made.
type VectorDB ¶
type VectorDB interface { // Set sets the vector associated with the given document ID to vec. Set(id string, vec llm.Vector) // Get gets the vector associated with the given document ID. // If no such document exists, Get returns nil, false. // If a document exists, Get returns vec, true. Get(id string) (llm.Vector, bool) // Batch returns a new [VectorBatch] that accumulates // vector database mutations to apply in an atomic operation. // It is more efficient than repeated calls to Set. Batch() VectorBatch // Search searches the database for the n vectors // most similar to vec, returning the document IDs // and similarity scores. Search(vec llm.Vector, n int) []VectorResult // Flush flushes storage to disk. Flush() }
A VectorDB is a vector database that implements nearest-neighbor search over embedding vectors corresponding to documents.
func MemVectorDB ¶
MemVectorDB returns a VectorDB that stores its vectors in db but uses a cached, in-memory copy to implement Search using a brute-force scan.
The namespace is incorporated into the keys used in the underlying db, to allow multiple vector databases to be stored in a single DB.
When MemVectorDB is called, it reads all previously stored vectors from db; after that, changes must be made using the MemVectorDB Set method.
A MemVectorDB requires approximately 3kB of memory per stored vector.
The db keys used by a MemVectorDB have the form
ordered.Encode("llm.Vector", namespace, id)
where id is the document ID passed to Set.
type VectorResult ¶
type VectorResult struct { ID string // document ID Score float64 // similarity score in range [0, 1]; 1 is exact match }
A VectorResult is a single document returned by a VectorDB search.