Documentation
¶
Index ¶
- Constants
- Variables
- func Insert(doc Document, handler DocumentHandler, storage Storage, llm LLM, ...) error
- type Document
- type DocumentHandler
- type EntityContext
- type EntityExtractionPromptData
- type EntityExtractionPromptEntityOutput
- type EntityExtractionPromptExample
- type EntityExtractionPromptRelationshipOutput
- type GraphEntity
- type GraphRelationship
- type GraphStorage
- type KeyValueStorage
- type KeywordExtractionPromptData
- type KeywordExtractionPromptExample
- type LLM
- type QueryConversation
- type QueryHandler
- type QueryResult
- type RelationshipContext
- type Source
- type SourceContext
- type Storage
- type VectorStorage
Constants ¶
const ( // RoleUser represents the user role in a conversation. RoleUser = "user" // RoleAssistant represents the assistant role in a conversation. RoleAssistant = "assistant" )
const GraphFieldSeparator = "<SEP>"
GraphFieldSeparator is a constant used to separate fields in a graph.
Variables ¶
var ( // ErrEntityNotFound is returned when an entity is not found in the storage. ErrEntityNotFound = errors.New("entity not found") // ErrRelationshipNotFound is returned when a relationship is not found in the storage. ErrRelationshipNotFound = errors.New("relationship not found") )
Functions ¶
func Insert ¶
func Insert(doc Document, handler DocumentHandler, storage Storage, llm LLM, logger *slog.Logger) error
Insert processes a document and stores it in the provided storage. It chunks the document content, extracts entities and relationships using the provided document handler, and stores the results in the appropriate storage. It returns an error if any step in the process fails.
Types ¶
type Document ¶
Document represents a text document to be processed and stored. It contains an ID for unique identification and the content to be analyzed.
type DocumentHandler ¶
type DocumentHandler interface { // ChunksDocument splits a document's content into smaller, manageable chunks. // It returns a slice of Source objects representing the document chunks, // without assigning IDs (IDs will be generated in the Insert function). ChunksDocument(content string) ([]Source, error) // EntityExtractionPromptData returns the data needed to generate prompts for extracting // entities and relationships from text content. // The implementation doesn't need to fill the Input field, as it will be filled in the // Insert function. EntityExtractionPromptData() EntityExtractionPromptData // MaxRetries determines the maximum number of retries allowed for the Chat function. // This is especially used when extracting entities and relationships from text content, // due to the incorrect format that sometimes LLM returns. MaxRetries() int // ConcurrencyCount determines the number of concurrent requests to the LLM. ConcurrencyCount() int // BackoffDuration determines the backoff duration between retries. BackoffDuration() time.Duration // GleanCount returns the maximum number of additional extraction attempts // to perform after the initial entity extraction to find entities that might // have been missed. GleanCount() int // MaxSummariesTokenLength returns the maximum token length allowed for entity // and relationship descriptions before they need to be summarized by the LLM. MaxSummariesTokenLength() int }
DocumentHandler provides an interface for processing documents and interacting with language models.
type EntityContext ¶
type EntityContext struct { Name string Type string Description string RefCount int CreatedAt time.Time }
EntityContext represents an entity retrieved from the knowledge graph with its context.
func (EntityContext) String ¶
func (e EntityContext) String() string
String returns a CSV-formatted string representation of the EntityContext.
type EntityExtractionPromptData ¶
type EntityExtractionPromptData struct { Goal string EntityTypes []string Language string Examples []EntityExtractionPromptExample Input string }
EntityExtractionPromptData contains the data needed to generate prompts for extracting entities and relationships from text content. It includes the goal of extraction, valid entity types, target language, example extractions, and the input text to be processed.
type EntityExtractionPromptEntityOutput ¶
EntityExtractionPromptEntityOutput represents the expected output format for an entity identified during extraction. It includes the entity's name, type, and description.
type EntityExtractionPromptExample ¶
type EntityExtractionPromptExample struct { EntityTypes []string Text string EntitiesOutputs []EntityExtractionPromptEntityOutput RelationshipsOutputs []EntityExtractionPromptRelationshipOutput }
EntityExtractionPromptExample provides sample inputs and outputs for demonstrating entity extraction to language models. It includes sample text content along with the expected entities and relationships that should be extracted from the text.
type EntityExtractionPromptRelationshipOutput ¶
type EntityExtractionPromptRelationshipOutput struct { SourceEntity string TargetEntity string Description string Keywords []string Strength float64 }
EntityExtractionPromptRelationshipOutput represents the expected output format for a relationship identified between entities during extraction. It includes source and target entities, description, relevant keywords, and a strength value indicating the relationship's importance.
type GraphEntity ¶
type GraphEntity struct { Name string `json:"entity_name"` Type string `json:"entity_type"` Descriptions string `json:"entity_description"` SourceIDs string CreatedAt time.Time }
GraphEntity represents an entity in the knowledge graph. It contains information about the entity's name, type, descriptions, sources, and creation timestamp.
type GraphRelationship ¶
type GraphRelationship struct { SourceEntity string `json:"source_entity"` TargetEntity string `json:"target_entity"` Weight float64 `json:"relationship_strength"` Descriptions string `json:"relationship_description"` Keywords []string `json:"relationship_keywords"` SourceIDs string CreatedAt time.Time }
GraphRelationship represents a relationship between two entities in the knowledge graph. It contains information about the source and target entities, relationship weight, descriptions, keywords, sources, and creation timestamp.
type GraphStorage ¶
type GraphStorage interface { // GraphEntity retrieves a single entity by name from the graph storage. // Returns ErrEntityNotFound if the entity doesn't exist. GraphEntity(name string) (GraphEntity, error) // GraphRelationship retrieves a relationship between sourceEntity and targetEntity. // Returns ErrRelationshipNotFound if the relationship doesn't exist. GraphRelationship(sourceEntity, targetEntity string) (GraphRelationship, error) // GraphUpsertEntity creates a new entity or updates an existing entity in the graph storage. // If the entity already exists, it should merge the new data with existing data. GraphUpsertEntity(entity GraphEntity) error // GraphUpsertRelationship creates a new relationship or updates an existing relationship // between two entities in the graph storage. // If the relationship already exists, it should merge the new data with existing data. GraphUpsertRelationship(relationship GraphRelationship) error // GraphEntities batch retrieves multiple entities by their names. // Returns a map with entity names as keys and entity objects as values. // If an entity doesn't exist, it should be omitted from the result map. GraphEntities(names []string) (map[string]GraphEntity, error) // GraphRelationships batch retrieves multiple relationships by their source-target pairs. // Returns a map with composite keys (formatted as "source-target") as keys and // relationship objects as values. // If a relationship doesn't exist, it should be omitted from the result map. GraphRelationships(pairs [][2]string) (map[string]GraphRelationship, error) // GraphCountEntitiesRelationships counts the number of relationships each entity has. // Returns a map with entity names as keys and relationship counts as values. // This is used to determine entity importance during queries. GraphCountEntitiesRelationships(names []string) (map[string]int, error) // GraphRelatedEntities finds entities directly connected to the specified entities. // Returns a map with entity names as keys and slices of directly connected entities as values. // Used to expand the context during queries. GraphRelatedEntities(names []string) (map[string][]GraphEntity, error) }
GraphStorage defines the interface for graph database operations. It provides methods to query and manipulate entities and relationships in a knowledge graph.
type KeyValueStorage ¶
type KeyValueStorage interface { // KVSource retrieves a source document chunk by its ID. // Returns an error if the source doesn't exist or can't be retrieved. KVSource(id string) (Source, error) // KVUpsertSources creates or updates multiple source document chunks at once. // Each source should be stored with its ID as the key. // This is called during document processing to store chunked documents. KVUpsertSources(sources []Source) error }
KeyValueStorage defines the interface for key-value storage operations. It provides methods to access and store source documents.
type KeywordExtractionPromptData ¶
type KeywordExtractionPromptData struct { Goal string Examples []KeywordExtractionPromptExample Query string History string }
KeywordExtractionPromptData contains the data needed to generate prompts for extracting keywords from user queries and conversation history. It includes the goal of keyword extraction, examples for demonstration, the current query, and relevant conversation history.
type KeywordExtractionPromptExample ¶
type KeywordExtractionPromptExample struct { Query string LowLevelKeywords []string HighLevelKeywords []string }
KeywordExtractionPromptExample provides sample inputs and outputs for demonstrating keyword extraction to language models. It includes a sample query along with expected high-level and low-level keywords that should be extracted from the query.
type LLM ¶
type LLM interface { // Chat sends messages to the LLM and returns the response. // A message with an even index is guaranteed to be sent by the user, while the odd index is // sent by the assistant. Chat(messages []string) (string, error) }
LLM defines the interface for language model operations. It provides methods for chat interaction, handling retries, extracting information, and managing token limits.
type QueryConversation ¶
QueryConversation represents a message in a conversation with its role.
func (QueryConversation) String ¶
func (q QueryConversation) String() string
String returns a string representation of the QueryConversation showing its role and content.
type QueryHandler ¶
type QueryHandler interface { // KeywordExtractionPromptData returns the data needed to generate prompts for extracting // keywords from user queries and conversation history. // The implementation doesn't need to fill the Query and History fields, as they will be filled // in the Query function. KeywordExtractionPromptData() KeywordExtractionPromptData }
QueryHandler defines the interface for handling RAG query operations.
type QueryResult ¶
type QueryResult struct { GlobalEntities []EntityContext GlobalRelationships []RelationshipContext GlobalSources []SourceContext LocalEntities []EntityContext LocalRelationships []RelationshipContext LocalSources []SourceContext }
QueryResult contains the retrieved context from both global and local searches. It includes entities, relationships, and sources organized by context type.
func Query ¶
func Query( conversations []QueryConversation, handler QueryHandler, storage Storage, llm LLM, logger *slog.Logger, ) (QueryResult, error)
Query performs a RAG search using the provided conversations. It extracts keywords from the user's query, searches for relevant entities and relationships in both local and global contexts, and returns the combined results.
func (QueryResult) String ¶
func (q QueryResult) String() string
String returns a CSV-formatted string representation of the QueryResult with entities, relationships, and sources organized in sections.
type RelationshipContext ¶
type RelationshipContext struct { Source string Target string Keywords string Description string Weight float64 RefCount int CreatedAt time.Time }
RelationshipContext represents a relationship between entities retrieved from the knowledge graph.
func (RelationshipContext) String ¶
func (r RelationshipContext) String() string
String returns a CSV-formatted string representation of the RelationshipContext.
type Source ¶
Source represents a document chunk with metadata. It contains the text content, size information, and position data.
type SourceContext ¶
SourceContext represents a source document chunk with reference count.
func (SourceContext) String ¶
func (s SourceContext) String() string
String returns a CSV-formatted string representation of the SourceContext.
type Storage ¶
type Storage interface { GraphStorage VectorStorage KeyValueStorage }
Storage is a composite interface that combines GraphStorage, VectorStorage, and KeyValueStorage interfaces to provide comprehensive data storage capabilities.
type VectorStorage ¶
type VectorStorage interface { // VectorQueryEntity performs a semantic search for entities based on the provided keywords. // Returns a slice of entity names that semantically match the keywords. // The results should be ordered by relevance. VectorQueryEntity(keywords string) ([]string, error) // VectorQueryRelationship performs a semantic search for relationships based on the provided keywords. // Returns a slice of source-target entity name pairs that semantically match the keywords. // The results should be ordered by relevance. VectorQueryRelationship(keywords string) ([][2]string, error) // VectorUpsertEntity creates or updates the vector representation of an entity. // The content parameter should contain the text used for semantic matching. // This typically includes the entity name and description. VectorUpsertEntity(name, content string) error // VectorUpsertRelationship creates or updates the vector representation of a relationship. // The content parameter should contain the text used for semantic matching. // This typically includes keywords, descriptions, and entity names. VectorUpsertRelationship(source, target, content string) error }
VectorStorage defines the interface for vector database operations. It provides methods to query and store entities and relationships in a vector space for semantic search capabilities.