docs

package
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 25, 2024 License: BSD-3-Clause Imports: 5 Imported by: 0

Documentation

Overview

Package docs implements a corpus of text documents identified by document IDs. It allows retrieving the documents by ID as well as retrieving documents that are new since a previous scan.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Corpus

type Corpus struct {
	// contains filtered or unexported fields
}

A Corpus is the collection of documents stored in a database.

func New

func New(db storage.DB) *Corpus

New returns a new Corpus representing the documents stored in db.

func (*Corpus) Add

func (c *Corpus) Add(id, title, text string)

Add adds a document with the given id, title, and text. If the document already exists in the corpus with the same title and text, Add is an no-op. Otherwise, if the document already exists in the corpus, it is replaced.

func (*Corpus) DocWatcher

func (c *Corpus) DocWatcher(name string) *timed.Watcher[*Doc]

DocWatcher returns a new storage.Watcher with the given name. It picks up where any previous Watcher of the same name left off.

func (*Corpus) Docs

func (c *Corpus) Docs(prefix string) iter.Seq[*Doc]

Docs returns an iterator over all documents in the corpus with IDs starting with a given prefix. The documents are ordered by ID.

func (*Corpus) DocsAfter

func (c *Corpus) DocsAfter(dbtime timed.DBTime, prefix string) iter.Seq[*Doc]

DocsAfter returns an iterator over all documents with DBTime greater than dbtime and with IDs starting with the prefix. The documents are ordered by DBTime.

func (*Corpus) Get

func (c *Corpus) Get(id string) (doc *Doc, ok bool)

Get returns the document with the given id. It returns nil, false if no document is found. It returns d, true otherwise.

type Doc

type Doc struct {
	DBTime timed.DBTime // database time (from storage.Now) when Doc was written
	ID     string       // document identifier (such as a URL)
	Title  string       // title of document
	Text   string       // text of document
}

A Doc is a single document in the Corpus.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL