multiple

command

v0.1.2 Latest Latest Go to latest Published: Apr 5, 2025 License: MIT Imports: 20 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/MegaGrindStone/go-light-rag

Links

Open Source Insights

README ¶

go-light-rag Multiple Document Example

This example demonstrates how to use the go-light-rag library to build a RAG system that processes multiple documents. The implementation showcases how to handle different file types with specialized handlers and efficiently process document changes across multiple runs.

Prerequisites

Before running this example, you'll need:

A graph database (Neo4J or MemGraph)
An OpenAI API key
A directory of text/code files for processing

Setup Instructions

1. Clone & Navigate

After cloning the repository, change to the example directory:

cd example/multiple

2. Set Up Graph Database

Choose one of these options:

Create a free account on Neo4J
Self-host a compatible graph database like MemGraph

Make note of your graph database URI, username, and password.

3. Get OpenAI API Key

Create an account on OpenAI and generate an API key.

4. Prepare Your Documents Directory

Create a directory containing the files you want to analyze. The system supports various file types with specialized handling for Go source files.

5. Configure the Application

Copy the example configuration file and update it with your credentials:

cp config.example.yaml config.yaml

Edit config.yaml to include your:

Neo4J connection details
OpenAI API key and model
Document directory path (docs_dir)
Custom entity types (optional)
Preferred logging level

Running the Example

Execute the application:

go run main.go

First run: The system will process all files in your documents directory and build the RAG database. This can take significant time depending on the size of your codebase (approximately 1 hour for the entire go-light-rag codebase).

Subsequent runs: The system will check for changed files using hash comparisons and only process those that have been modified since the previous run.

How It Works

The system:

Walks through your document directory (respecting .gitignore files)
Selects appropriate handlers based on file extensions (Go handler for .go files, Default handler for others)
Stores document hashes to track changes between runs
Embeds document text using OpenAI
Stores vectors in ChromeM and metadata in BoltDB
Creates a knowledge graph in Neo4J
Provides an interactive query interface with handler selection
Retrieves relevant information for each query using the selected handler
Uses a prompt template to ensure consistent output quality

Key Features

Multiple Handler Support

The example provides specialized handling for different file types:

Default Handler: For general text documents
Go Handler: Specialized for Go source code files

Incremental Processing

The system uses file hashing to detect changes:

Only processes new or modified files on subsequent runs
Saves significant time when working with large document collections

GitIgnore Support

The system respects .gitignore files in your document directory:

Automatically skips files that match gitignore patterns
Follows standard gitignore rules for path matching

Custom Entity Types

You can configure custom entity types in the configuration file to better suit your document domain:

defaultEntityTypes:
  - "config"
  - "dependency"
  - "reference"
  # ... other domain-specific entity types

Query Interface

When running the example, you'll be prompted to:

Select a handler type for your query (Default or Go)
Enter your query
Review the results, which include both vector retrieval data and knowledge graph relationships

This allows you to choose the most appropriate handler for each specific query, improving result quality.

Demo

Note: This demo uses the go-light-rag codebase itself as the knowledge source and runs with the gpt-4o-mini model. The initial document insertion process (which takes approximately 1 hour) is not included in the demo recording.

Documentation ¶

There is no documentation for this package.

Source Files ¶

View all Source files

main.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL