golightrag

package module
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 5, 2025 License: MIT Imports: 14 Imported by: 0

README

go-light-rag

Go Reference CI Go Report Card codecov

A Go library implementation of LightRAG - an advanced Retrieval-Augmented Generation (RAG) system that uniquely combines vector databases with graph database relationships to enhance knowledge retrieval.

Overview

go-light-rag is a Go library that implements the core components of the LightRAG architecture rather than providing an end-to-end system. The library centers around two essential functions:

  • Insert: Add documents to the knowledge base with flexible processing options
  • Query: Retrieve contextually relevant information while preserving raw results

Unlike the original Python implementation which offers a complete RAG solution, this library deliberately separates the document processing pipeline from prompt engineering concerns. This approach gives developers:

  1. Full control over document insertion workflows
  2. Direct access to retrieved context data
  3. Freedom to craft custom prompts tailored to specific use cases
  4. Ability to integrate with existing Go applications and workflows

The minimalist API combined with powerful extension points makes go-light-rag ideal for developers who need the benefits of hybrid retrieval without being constrained by predefined prompt templates or processing pipelines.

Architecture

go-light-rag is built on well-defined interfaces that enable flexibility, extensibility, and modular design. These interfaces define the contract between components, allowing you to replace or extend functionality without modifying the core logic.

1. Language Models (LLM)

The LLM interface abstracts different language model providers with these implementations included:

  • OpenAI: Full support for GPT models
  • Anthropic: Integration with Claude models
  • Ollama: Self-hosted option for open-source models
  • OpenRouter: Unified access to multiple model providers

Custom implementations can be created by implementing the LLM interface, which requires only a Chat() method.

2. Storage

The library defines three storage interfaces:

  • GraphStorage: Manages entity and relationship data
  • VectorStorage: Provides semantic search capabilities
  • KeyValueStorage: Stores original document chunks
Implementations Provided

You can implement any of these interfaces to use different storage solutions.

3. Handlers

Handlers control document and query processing:

  • DocumentHandler: Controls chunking, entity extraction, and processing
  • QueryHandler: Manages keyword extraction and prompt structuring
Included Handlers
  • Default: General-purpose text document processing that follows the official Python implementation. Using the zero-value for Default handler will use the same configuration as the Python implementation.
  • Semantic: Advanced handler that extends Default to create semantically meaningful chunks by leveraging LLM to identify natural content boundaries rather than fixed token counts. Improves RAG quality at the cost of additional LLM calls.
  • Go: Specialized handler for Go source code using AST parsing to divide code into logical sections like functions, types, and declarations.

Custom handlers can embed existing handlers and override only specific methods.

Usage Examples

Document Insertion
// Initialize LLM
llm := llm.NewOpenAI(apiKey, model, params, logger)

// Initialize storage components
graphDB, _ := storage.NewNeo4J("bolt://localhost:7687", "neo4j", "password")

embeddingFunc := storage.EmbeddingFunc(chromem.NewEmbeddingFuncOpenAI("open_ai_key", chromem.EmbeddingModelOpenAI3Large))

// Option 1: Use ChromeM for vector storage
vecDB, _ := storage.NewChromem("vec.db", 5, embeddingFunc)

// Option 2: Use Milvus for vector storage instead
// vectorDim := 1536 // Dimension for embeddings (e.g. 1536 for OpenAI)
// milvusCfg := &milvusclient.ClientConfig{
// 	Address:  os.Getenv("MILVUS_ADDRESS"),
// 	Username: os.Getenv("MILVUS_USER"),
// 	Password: os.Getenv("MILVUS_PASSWORD"),
// 	DBName:   os.Getenv("MILVUS_DB"),
// }
// vecDB, _ := storage.NewMilvus(milvusCfg, 5, vectorDim, embeddingFunc)

// Use BoltDB for key-value storage
kvDB, _ := storage.NewBolt("kv.db")
// Or use Redis instead
// kvDB, _ := storage.NewRedis("localhost:6379", "", 0)

store := storageWrapper{
    Bolt:    kvDB,
    // Redis:   kvDB, // If redis is used, use kvDB from Option 2
    Chromem: vecDB,
    // Milvus:  vecDB, // If milvus is used, use vecDB from Option 2
    Neo4J:   graphDB,
}

// Use default document handler with zero values to match Python implementation behavior
handler := handler.Default{}

// Insert a document
doc := golightrag.Document{
    ID:      "unique-document-id",
    Content: documentContent,
}

err := golightrag.Insert(doc, handler, store, llm, logger)
Query Processing
// Create a conversation with the user's query
conversation := []golightrag.QueryConversation{
    {
        Role:    golightrag.RoleUser,
        Message: "What do you know about the main characters?",
    },
}

// Execute the query
result, err := golightrag.Query(conversation, handler, store, llm, logger)
if err != nil {
    log.Fatalf("Error processing query: %v", err)
}

// Access the retrieved context
fmt.Printf("Found %d local entities and %d global entities\n", 
    len(result.LocalEntities), len(result.GlobalEntities))

// Process source documents
for _, source := range result.LocalSources {
    fmt.Printf("Source ID: %s\nRelevance: %.2f\nContent: %s\n\n", 
        source.ID, source.Relevance, source.Content)
}

// Or use the convenient String method for formatted results
fmt.Println(result)

Handler Configuration Tips

  1. Choose the right handler for your documents:

    • Default for general text
    • Semantic for improved content comprehension where cost is less important
    • Go for Go source code
    • Create custom handlers for specialized content
  2. Optimize chunking parameters:

    • Larger chunks provide more context but may exceed token limits
    • Smaller chunks process faster but may lose context
    • Balance overlap to maintain concept continuity
    • Consider Semantic handler for content where natural boundaries are important
  3. When using the Semantic handler:

    • Set appropriate TokenThreshold based on your LLM context window
    • Configure MaxChunkSize to limit individual chunk sizes
    • Provide a reliable LLM instance as it's required for semantic analysis
  4. Configure concurrency appropriately:

    • Higher concurrency speeds up processing but increases resource usage
    • Balance according to your hardware capabilities and LLM rate limits
  5. Customize entity types:

    • Define entity types relevant to your domain
    • Be specific enough to capture important concepts
    • Be general enough to avoid excessive fragmentation

Benchmarks

go-light-rag includes benchmark tests comparing its performance against a NaiveRAG implementation. The benchmarks use the same evaluation prompts as the Python implementation but with different documents and queries.

For detailed benchmark results and methodology, visit the benchmark directory.

Examples and Documentation

For more detailed examples, please refer to the examples directory.

Documentation

Index

Constants

View Source
const (
	// RoleUser represents the user role in a conversation.
	RoleUser = "user"
	// RoleAssistant represents the assistant role in a conversation.
	RoleAssistant = "assistant"
)
View Source
const GraphFieldSeparator = "<SEP>"

GraphFieldSeparator is a constant used to separate fields in a graph.

Variables

View Source
var (
	// ErrEntityNotFound is returned when an entity is not found in the storage.
	ErrEntityNotFound = errors.New("entity not found")
	// ErrRelationshipNotFound is returned when a relationship is not found in the storage.
	ErrRelationshipNotFound = errors.New("relationship not found")
)

Functions

func Insert

func Insert(doc Document, handler DocumentHandler, storage Storage, llm LLM, logger *slog.Logger) error

Insert processes a document and stores it in the provided storage. It chunks the document content, extracts entities and relationships using the provided document handler, and stores the results in the appropriate storage. It returns an error if any step in the process fails.

Types

type Document

type Document struct {
	ID      string
	Content string
}

Document represents a text document to be processed and stored. It contains an ID for unique identification and the content to be analyzed.

type DocumentHandler

type DocumentHandler interface {
	// ChunksDocument splits a document's content into smaller, manageable chunks.
	// It returns a slice of Source objects representing the document chunks,
	// without assigning IDs (IDs will be generated in the Insert function).
	ChunksDocument(content string) ([]Source, error)
	// EntityExtractionPromptData returns the data needed to generate prompts for extracting
	// entities and relationships from text content.
	// The implementation doesn't need to fill the Input field, as it will be filled in the
	// Insert function.
	EntityExtractionPromptData() EntityExtractionPromptData
	// MaxRetries determines the maximum number of retries allowed for the Chat function.
	// This is especially used when extracting entities and relationships from text content,
	// due to the incorrect format that sometimes LLM returns.
	MaxRetries() int
	// ConcurrencyCount determines the number of concurrent requests to the LLM.
	ConcurrencyCount() int
	// BackoffDuration determines the backoff duration between retries.
	BackoffDuration() time.Duration
	// GleanCount returns the maximum number of additional extraction attempts
	// to perform after the initial entity extraction to find entities that might
	// have been missed.
	GleanCount() int
	// MaxSummariesTokenLength returns the maximum token length allowed for entity
	// and relationship descriptions before they need to be summarized by the LLM.
	MaxSummariesTokenLength() int
}

DocumentHandler provides an interface for processing documents and interacting with language models.

type EntityContext

type EntityContext struct {
	Name        string
	Type        string
	Description string
	RefCount    int
	CreatedAt   time.Time
}

EntityContext represents an entity retrieved from the knowledge graph with its context.

func (EntityContext) String

func (e EntityContext) String() string

String returns a CSV-formatted string representation of the EntityContext.

type EntityExtractionPromptData

type EntityExtractionPromptData struct {
	Goal        string
	EntityTypes []string
	Language    string
	Examples    []EntityExtractionPromptExample

	Input string
}

EntityExtractionPromptData contains the data needed to generate prompts for extracting entities and relationships from text content. It includes the goal of extraction, valid entity types, target language, example extractions, and the input text to be processed.

type EntityExtractionPromptEntityOutput

type EntityExtractionPromptEntityOutput struct {
	Name        string
	Type        string
	Description string
}

EntityExtractionPromptEntityOutput represents the expected output format for an entity identified during extraction. It includes the entity's name, type, and description.

type EntityExtractionPromptExample

type EntityExtractionPromptExample struct {
	EntityTypes          []string
	Text                 string
	EntitiesOutputs      []EntityExtractionPromptEntityOutput
	RelationshipsOutputs []EntityExtractionPromptRelationshipOutput
}

EntityExtractionPromptExample provides sample inputs and outputs for demonstrating entity extraction to language models. It includes sample text content along with the expected entities and relationships that should be extracted from the text.

type EntityExtractionPromptRelationshipOutput

type EntityExtractionPromptRelationshipOutput struct {
	SourceEntity string
	TargetEntity string
	Description  string
	Keywords     []string
	Strength     float64
}

EntityExtractionPromptRelationshipOutput represents the expected output format for a relationship identified between entities during extraction. It includes source and target entities, description, relevant keywords, and a strength value indicating the relationship's importance.

type GraphEntity

type GraphEntity struct {
	Name         string `json:"entity_name"`
	Type         string `json:"entity_type"`
	Descriptions string `json:"entity_description"`
	SourceIDs    string
	CreatedAt    time.Time
}

GraphEntity represents an entity in the knowledge graph. It contains information about the entity's name, type, descriptions, sources, and creation timestamp.

type GraphRelationship

type GraphRelationship struct {
	SourceEntity string   `json:"source_entity"`
	TargetEntity string   `json:"target_entity"`
	Weight       float64  `json:"relationship_strength"`
	Descriptions string   `json:"relationship_description"`
	Keywords     []string `json:"relationship_keywords"`
	SourceIDs    string
	CreatedAt    time.Time
}

GraphRelationship represents a relationship between two entities in the knowledge graph. It contains information about the source and target entities, relationship weight, descriptions, keywords, sources, and creation timestamp.

type GraphStorage

type GraphStorage interface {
	// GraphEntity retrieves a single entity by name from the graph storage.
	// Returns ErrEntityNotFound if the entity doesn't exist.
	GraphEntity(name string) (GraphEntity, error)
	// GraphRelationship retrieves a relationship between sourceEntity and targetEntity.
	// Returns ErrRelationshipNotFound if the relationship doesn't exist.
	GraphRelationship(sourceEntity, targetEntity string) (GraphRelationship, error)

	// GraphUpsertEntity creates a new entity or updates an existing entity in the graph storage.
	// If the entity already exists, it should merge the new data with existing data.
	GraphUpsertEntity(entity GraphEntity) error
	// GraphUpsertRelationship creates a new relationship or updates an existing relationship
	// between two entities in the graph storage.
	// If the relationship already exists, it should merge the new data with existing data.
	GraphUpsertRelationship(relationship GraphRelationship) error

	// GraphEntities batch retrieves multiple entities by their names.
	// Returns a map with entity names as keys and entity objects as values.
	// If an entity doesn't exist, it should be omitted from the result map.
	GraphEntities(names []string) (map[string]GraphEntity, error)
	// GraphRelationships batch retrieves multiple relationships by their source-target pairs.
	// Returns a map with composite keys (formatted as "source-target") as keys and
	// relationship objects as values.
	// If a relationship doesn't exist, it should be omitted from the result map.
	GraphRelationships(pairs [][2]string) (map[string]GraphRelationship, error)

	// GraphCountEntitiesRelationships counts the number of relationships each entity has.
	// Returns a map with entity names as keys and relationship counts as values.
	// This is used to determine entity importance during queries.
	GraphCountEntitiesRelationships(names []string) (map[string]int, error)
	// GraphRelatedEntities finds entities directly connected to the specified entities.
	// Returns a map with entity names as keys and slices of directly connected entities as values.
	// Used to expand the context during queries.
	GraphRelatedEntities(names []string) (map[string][]GraphEntity, error)
}

GraphStorage defines the interface for graph database operations. It provides methods to query and manipulate entities and relationships in a knowledge graph.

type KeyValueStorage

type KeyValueStorage interface {
	// KVSource retrieves a source document chunk by its ID.
	// Returns an error if the source doesn't exist or can't be retrieved.
	KVSource(id string) (Source, error)
	// KVUpsertSources creates or updates multiple source document chunks at once.
	// Each source should be stored with its ID as the key.
	// This is called during document processing to store chunked documents.
	KVUpsertSources(sources []Source) error
}

KeyValueStorage defines the interface for key-value storage operations. It provides methods to access and store source documents.

type KeywordExtractionPromptData

type KeywordExtractionPromptData struct {
	Goal     string
	Examples []KeywordExtractionPromptExample

	Query   string
	History string
}

KeywordExtractionPromptData contains the data needed to generate prompts for extracting keywords from user queries and conversation history. It includes the goal of keyword extraction, examples for demonstration, the current query, and relevant conversation history.

type KeywordExtractionPromptExample

type KeywordExtractionPromptExample struct {
	Query             string
	LowLevelKeywords  []string
	HighLevelKeywords []string
}

KeywordExtractionPromptExample provides sample inputs and outputs for demonstrating keyword extraction to language models. It includes a sample query along with expected high-level and low-level keywords that should be extracted from the query.

type LLM

type LLM interface {
	// Chat sends messages to the LLM and returns the response.
	// A message with an even index is guaranteed to be sent by the user, while the odd index is
	// sent by the assistant.
	Chat(messages []string) (string, error)
}

LLM defines the interface for language model operations. It provides methods for chat interaction, handling retries, extracting information, and managing token limits.

type QueryConversation

type QueryConversation struct {
	Message string
	Role    string
}

QueryConversation represents a message in a conversation with its role.

func (QueryConversation) String

func (q QueryConversation) String() string

String returns a string representation of the QueryConversation showing its role and content.

type QueryHandler

type QueryHandler interface {
	// KeywordExtractionPromptData returns the data needed to generate prompts for extracting
	// keywords from user queries and conversation history.
	// The implementation doesn't need to fill the Query and History fields, as they will be filled
	// in the Query function.
	KeywordExtractionPromptData() KeywordExtractionPromptData
}

QueryHandler defines the interface for handling RAG query operations.

type QueryResult

type QueryResult struct {
	GlobalEntities      []EntityContext
	GlobalRelationships []RelationshipContext
	GlobalSources       []SourceContext
	LocalEntities       []EntityContext
	LocalRelationships  []RelationshipContext
	LocalSources        []SourceContext
}

QueryResult contains the retrieved context from both global and local searches. It includes entities, relationships, and sources organized by context type.

func Query

func Query(
	conversations []QueryConversation,
	handler QueryHandler,
	storage Storage,
	llm LLM,
	logger *slog.Logger,
) (QueryResult, error)

Query performs a RAG search using the provided conversations. It extracts keywords from the user's query, searches for relevant entities and relationships in both local and global contexts, and returns the combined results.

func (QueryResult) String

func (q QueryResult) String() string

String returns a CSV-formatted string representation of the QueryResult with entities, relationships, and sources organized in sections.

type RelationshipContext

type RelationshipContext struct {
	Source      string
	Target      string
	Keywords    string
	Description string
	Weight      float64
	RefCount    int
	CreatedAt   time.Time
}

RelationshipContext represents a relationship between entities retrieved from the knowledge graph.

func (RelationshipContext) String

func (r RelationshipContext) String() string

String returns a CSV-formatted string representation of the RelationshipContext.

type Source

type Source struct {
	ID         string
	Content    string
	TokenSize  int
	OrderIndex int
}

Source represents a document chunk with metadata. It contains the text content, size information, and position data.

type SourceContext

type SourceContext struct {
	Content  string
	RefCount int
}

SourceContext represents a source document chunk with reference count.

func (SourceContext) String

func (s SourceContext) String() string

String returns a CSV-formatted string representation of the SourceContext.

type Storage

type Storage interface {
	GraphStorage
	VectorStorage
	KeyValueStorage
}

Storage is a composite interface that combines GraphStorage, VectorStorage, and KeyValueStorage interfaces to provide comprehensive data storage capabilities.

type VectorStorage

type VectorStorage interface {
	// VectorQueryEntity performs a semantic search for entities based on the provided keywords.
	// Returns a slice of entity names that semantically match the keywords.
	// The results should be ordered by relevance.
	VectorQueryEntity(keywords string) ([]string, error)
	// VectorQueryRelationship performs a semantic search for relationships based on the provided keywords.
	// Returns a slice of source-target entity name pairs that semantically match the keywords.
	// The results should be ordered by relevance.
	VectorQueryRelationship(keywords string) ([][2]string, error)

	// VectorUpsertEntity creates or updates the vector representation of an entity.
	// The content parameter should contain the text used for semantic matching.
	// This typically includes the entity name and description.
	VectorUpsertEntity(name, content string) error
	// VectorUpsertRelationship creates or updates the vector representation of a relationship.
	// The content parameter should contain the text used for semantic matching.
	// This typically includes keywords, descriptions, and entity names.
	VectorUpsertRelationship(source, target, content string) error
}

VectorStorage defines the interface for vector database operations. It provides methods to query and store entities and relationships in a vector space for semantic search capabilities.

Directories

Path Synopsis
examples
default command
multiple command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL