iterate

package module
v3.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 3, 2025 License: BSD-3-Clause Imports: 26 Imported by: 36

README

go-whosonfirst-iterate

Go package for iterating through collections of Who's On First documents.

Documentation

Go Reference

Example

Version 3.x of this package introduce major, backward-incompatible changes from earlier releases. That said, migragting from version 2.x to 3.x should be relatively straightforward as a the basic concepts are still the same but (hopefully) simplified. Where version 2.x relied on defining a custom callback for looping over records version 3.x use Go's iter.Seq2 iterator construct to yield records as they are encountered.

For example:

import (
	"context"
	"flag"
	"log"

	"github.com/whosonfirst/go-whosonfirst-iterate/v3"
)

func main() {

     	var iterator_uri string

	flag.StringVar(&iterator_uri, "iterator-uri", "repo://". "A registered whosonfirst/go-whosonfirst-iterate/v3.Iterator URI.")
	ctx := context.Background()
	
	iter, _:= iterate.NewIterator(ctx, iterator_uri)
	defer iter.Close()
	
	paths := flag.Args()
	
	for rec, _ := range iter.Iterate(ctx, paths...) {
	    	defer rec.Body.Close()
		log.Printf("Indexing %s\n", rec.Path)
	}
}

Error handling removed for the sake of brevity.

Version 2.x (the old way)

This is how you would do the same thing using the older version 2.x code:

package main

import (
       "context"
       "flag"
       "io"
       "log"

       "github.com/whosonfirst/go-whosonfirst-iterate/v2/emitter"       
       "github.com/whosonfirst/go-whosonfirst-iterate/v2/iterator"
)

func main() {

	emitter_uri := flag.String("emitter-uri", "repo://", "A valid whosonfirst/go-whosonfirst-iterate/emitter URI")
	
     	flag.Parse()

	ctx := context.Background()

	emitter_cb := func(ctx context.Context, path string, fh io.ReadSeeker, args ...interface{}) error {
		log.Printf("Indexing %s\n", path)
		return nil
	}

	iter, _ := iterator.NewIterator(ctx, *emitter_uri, cb)

	uris := flag.Args()
	iter.IterateURIs(ctx, uris...)
}

Iterators

Iterators are defined as a standalone packages implementing the Iterator interface:

// Iterator defines an interface for iterating through collections  of Who's On First documents.
type Iterator interface {
	// Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in one or more URIs.
	Iterate(context.Context, ...string) iter.Seq2[*Record, error]
	// Seen() returns the total number of records processed so far.
	Seen() int64
	// IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
	IsIterating() bool
	// Close performs any implementation specific tasks before terminating the iterator.
	Close() error	
}

Then, at the package level, they are "registered" with the iterate package so that they can be invoked using a simple declarative URI syntax. For example:

func init() {
	ctx := context.Background()
	err := RegisterIterator(ctx, "cwd", NewCwdIterator)

	if err != nil {
		panic(err)
	}
}

And then:

it, err := iterate.NewIterator(ctx, "cwd://")

Importantly, Iterator implementations that are "registered" are wrapped in a second (internal) Iterator implementation that provides for concurrent processing, retries and regular-expression based file inclusion and exclusion rules. These criteria are defined using query parameters appended to the initial iterator URI that are prefixed with an "_" character. For example:

it, err := iterate.NewIterator(ctx, "cwd://?_exclude=.*\.txt$")

The following iterators schemes are supported by default:

cwd://

CwdIterator implements the Iterator interface for crawling records in the current working directory.

directory://

DirectoryIterator implements the Iterator interface for crawling records in a directory.

featurecollection://

FeatureCollectionIterator implements the Iterator interface for crawling features in a GeoJSON FeatureCollection record.

file://

FileIterator implements the Iterator interface for crawling individual file records.

filelist://

FileListIterator implements the Iterator interface for crawling records listed in a "file list" (a plain text newline-delimted list of files).

fs://

FSIterator implements the Iterator interface for crawling records listed in a fs.FS instance. For example:

import (
	"context"
	"flag"
	"io/fs"	
	"log"
	
	"github.com/whosonfirst/go-whosonfirst-iterate/v3"
)

func main() {

     	var iterator_uri string

	flag.StringVar(&iterator_uri, "iterator-uri", "fs://". "A registered whosonfirst/go-whosonfirst-iterate/v3.Iterator URI.")
	ctx := context.Background()

	// Your fs.FS goes here
	var your_fs fs.FS
	
	iter, _:= iterate.NewFSIterator(ctx, iterator_uri, fs)

	for rec, _ := range iter.Iterate(ctx, ".") {
	    	defer rec.Body.Close()
		log.Printf("Indexing %s\n", rec.Path)
	}
}

Notes:

  • The go-whosonfirst-iterate-fs/v3 implementation does NOT register itself with the whosonfirst/go-whosonfirst-iterate.RegisterIterator method and is NOT instantiated using the whosonfirst/go-whosonfirst-iterate.NewIterator method since fs.FS instances can not be defined as URI constructs.
  • Under the hood the NewFSIterator is wrapping a FSIterator instance in a whosonfirst/go-whosonfirst-iterate.concrurrentIterator instance to provide for throttling, filtering and other common (configurable) operations.
geojsonl://

GeojsonLIterator implements the Iterator interface for crawling features in a line-separated GeoJSON record.

null://

NullIterator implements the Iterator interface for appearing to crawl records but not doing anything.

repo://

RepoIterator implements the Iterator interface for crawling records in a Who's On First style data directory.

Query parameters

The following query parameters are honoured by all iterate.Iterator instances:

Name Value Required Notes
include String No One or more query filters (described below) to limit documents that will be processed.
exclude String No One or more query filters (described below) for excluding documents from being processed.

The following query paramters are honoured for iterate.Iterator URIs passed to the iterator.NewIterator method:

Name Value Required Notes
_max_procs Int No To be written
_include String (a valid regular expression) for paths (uris) to include for processing. No To be written
_exclude String (a valid regular expression) for paths (uris) to exclude from processing. No To be written
_exclude_alt Bool No If true do not process "alternate geometry" files.
_retry Bool No A boolean flag signaling that if a URI being walked fails it should be retried. Used in conjunction with the _max_retries and _retry_after parameters.
_max_retries Int No The maximum number of attempts to walk any given URI. Defaults to "1" and the _retry parameter must evaluate to a true value in order to change the default.
_retry_after Int The number of seconds to wait between attempts to walk any given URI. Defaults to "10" (seconds) and the _retry parameter must evaluate to a true value in order to change the default.
_dedupe Bool No A boolean value to track and skip records (specifically their relative URI) that have already been processed.
_with_stats Bool No Boolean flag indicating whether stats should be logged. Default is true.
_stats_interval Int No The number of seconds between stats logging events. Default is 60.
_stats_level String No The (slog/log) level at which stats are logged. Default is INFO.

Filters

QueryFilters

You can also specify inline queries by appending one or more include or exclude parameters to a iterate.Iterator URI, where the value is a string in the format of:

{PATH}={REGULAR EXPRESSION}

Paths follow the dot notation syntax used by the tidwall/gjson package and regular expressions are any valid Go language regular expression. Successful path lookups will be treated as a list of candidates and each candidate's string value will be tested against the regular expression's MatchString method.

For example:

repo://?include=properties.wof:placetype=region

You can pass multiple query parameters. For example:

repo://?include=properties.wof:placetype=region&include=properties.wof:name=(?i)new.*

The default query mode is to ensure that all queries match but you can also specify that only one or more queries need to match by appending a include_mode or exclude_mode parameter where the value is either "ANY" or "ALL".

Tools

$> make cli
go build -mod vendor -o bin/count cmd/count/main.go
go build -mod vendor -o bin/emit cmd/emit/main.go
count

Count files in one or more whosonfirst/go-whosonfirst-iterate/v3 iterator sources.

$> ./bin/count -h
Count files in one or more whosonfirst/go-whosonfirst-iterate/v3.Iterator sources.
Usage:
	 ./bin/count [options] uri(N) uri(N)
Valid options are:

  -iterator-uri string
    	A valid whosonfirst/go-whosonfirst-iterate/v3.Iterator URI. Supported iterator URI schemes are: cwd://,directory://,featurecollection://,file://,filelist://,geojsonl://,null://,repo:// (default "repo://")

For example:

$> ./bin/count fixtures
2025/06/23 08:26:59 INFO Counted records count=37 time=9.216979ms
emit

Emit records in one or more whosonfirst/go-whosonfirst-iterate/v3.Iterator sources as structured data.

$> ./bin/emit -h
Emit records in one or more whosonfirst/go-whosonfirst-iterate/v3.Iterator sources as structured data.
Usage:
	 ./bin/emit [options] uri(N) uri(N)
Valid options are:

  -geojson
    	Emit features as a well-formed GeoJSON FeatureCollection record.
  -iterator-uri string
    	A valid whosonfirst/go-whosonfirst-iterate/v3.Iterator URI. Supported iterator URI schemes are: cwd://,directory://,featurecollection://,file://,filelist://,geojsonl://,null://,repo:// (default "repo://")
  -json
    	Emit features as a well-formed JSON array.
  -null
    	Publish features to /dev/null
  -stdout
    	Publish features to STDOUT. (default true)

For example:

$> ./bin/emit \
	-iterator-uri 'repo://?include=properties.sfomuseum:placetype=museum' \
	-geojson \	
	fixtures \

| jq '.features[]["properties"]["wof:id"]'

1360391311
1360391313
1360391315
1360391317
1360391321
1360391323
1360391325
1360391327
1360391329
...and so on

Notes about writing your own iterate.Iterator implementation.

Under the hood all iterate.Iterate instances are wrapped using the (private) concurrentIterator implementation. This is the code that implements throttling, file matching and other common tasks. That happens automatically when code calls iterate.NewIterator but you do need to make sure that you "register" your custom implementation, for example:

package custom

import (
	"context"

	"github.com/whosonfirst/go-whosonfirst-iterate/v3"
)

func init() {

	ctx := context.Background()
	err := iterate.RegisterIterator(ctx, "custom", YourCustomIterator)

	if err != nil {
		panic(err)
	}
}

type CustomIterator struct {
	iterate.Iterator
}

func NewCustomIterator(ctx context.Context, uri string) (iterate.Iterator, error) {
	it := &CustomIterator{}
	return it, nil
}

// The rest of the iterate.Iterator interfece goes here...

Other implementations

See also

Documentation

Overview

Package iterator provides methods and utilities for iterating over a collection of records (presumed but not required to be Who's On First records) from a variety of sources and dispatching processing to user-defined callback functions.

Package iterate provides interfaces for iterating through a set of Who's On First documents.

Index

Constants

View Source
const STDIN string = "STDIN"

STDIN is a constant value signaling that a record was read from `STDIN` and has no URI (path).

Variables

This section is empty.

Functions

func ApplyFilters

func ApplyFilters(ctx context.Context, r io.ReadSeeker, f filters.Filters) (bool, error)

ApplyFilters is a convenience methods to test whether 'r' matches all the filters defined by 'f' and also "rewinds" 'r' before returning.

func IteratorSchemes

func IteratorSchemes() []string

IteratorSchemes() returns the list of schemes that have been "registered".

func ReaderWithPath

func ReaderWithPath(ctx context.Context, abs_path string) (io.ReadSeekCloser, error)

ReaderWithPath returns a new `io.ReadSeekCloser` instance derived from 'abs_path'.

func RegisterIterator

func RegisterIterator(ctx context.Context, scheme string, f IteratorInitializationFunc) error

RegisterIterator() associates 'scheme' with 'init_func' in an internal list of avilable `Iterator` implementations.

func ScrubURI added in v3.1.1

func ScrubURI(uri string) (string, error)

ScrubURI attempts to remove sensitive data (parameters, etc.) from 'uri' and return a new string (URI) which is safe to include in logging messages.

Types

type CwdIterator

type CwdIterator struct {
	Iterator
	// contains filtered or unexported fields
}

CwdIterator implements the `Iterator` interface for crawling records in a Who's On First style data directory.

func (*CwdIterator) Close

func (it *CwdIterator) Close() error

Close performs any implementation specific tasks before terminating the iterator.

func (*CwdIterator) IsIterating

func (it *CwdIterator) IsIterating() bool

IsIterating() returns a boolean value indicating whether 'it' is still processing documents.

func (*CwdIterator) Iterate

func (it *CwdIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*Record, error]

Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.

func (*CwdIterator) Seen

func (it *CwdIterator) Seen() int64

Seen() returns the total number of records processed so far.

type DirectoryIterator

type DirectoryIterator struct {
	Iterator
	// contains filtered or unexported fields
}

DirectoryIterator implements the `Iterator` interface for crawling records in a directory.

func (*DirectoryIterator) Close

func (it *DirectoryIterator) Close() error

Close performs any implementation specific tasks before terminating the iterator.

func (*DirectoryIterator) IsIterating

func (it *DirectoryIterator) IsIterating() bool

IsIterating() returns a boolean value indicating whether 'it' is still processing documents.

func (*DirectoryIterator) Iterate

func (it *DirectoryIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*Record, error]

Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.

func (*DirectoryIterator) Seen

func (it *DirectoryIterator) Seen() int64

Seen() returns the total number of records processed so far.

type FSIterator

type FSIterator struct {
	Iterator
	// contains filtered or unexported fields
}

FSIterator implements the `Iterator` interface for crawling records in a fs.

func (*FSIterator) Close

func (it *FSIterator) Close() error

Close performs any implementation specific tasks before terminating the iterator.

func (*FSIterator) IsIterating

func (it *FSIterator) IsIterating() bool

IsIterating() returns a boolean value indicating whether 'it' is still processing documents.

func (*FSIterator) Iterate

func (it *FSIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*Record, error]

Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.

func (*FSIterator) Seen

func (it *FSIterator) Seen() int64

Seen() returns the total number of records processed so far.

type FeatureCollectionIterator

type FeatureCollectionIterator struct {
	Iterator
	// contains filtered or unexported fields
}

FeatureCollectionIterator implements the `Iterator` interface for crawling features in a GeoJSON FeatureCollection record.

func (*FeatureCollectionIterator) Close

func (it *FeatureCollectionIterator) Close() error

Close performs any implementation specific tasks before terminating the iterator.

func (*FeatureCollectionIterator) IsIterating

func (it *FeatureCollectionIterator) IsIterating() bool

IsIterating() returns a boolean value indicating whether 'it' is still processing documents.

func (*FeatureCollectionIterator) Iterate

func (it *FeatureCollectionIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*Record, error]

Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.

func (*FeatureCollectionIterator) Seen

func (it *FeatureCollectionIterator) Seen() int64

Seen() returns the total number of records processed so far.

type FileIterator

type FileIterator struct {
	Iterator
	// contains filtered or unexported fields
}

FileIterator implements the `Iterator` interface for crawling individual file records.

func (*FileIterator) Close

func (it *FileIterator) Close() error

Close performs any implementation specific tasks before terminating the iterator.

func (*FileIterator) IsIterating

func (it *FileIterator) IsIterating() bool

IsIterating() returns a boolean value indicating whether 'it' is still processing documents.

func (*FileIterator) Iterate

func (it *FileIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*Record, error]

Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.

func (*FileIterator) Seen

func (it *FileIterator) Seen() int64

Seen() returns the total number of records processed so far.

type FileListIterator

type FileListIterator struct {
	Iterator
	// contains filtered or unexported fields
}

FileListIterator implements the `Iterator` interface for crawling records listed in a "file list" (a plain text newline-delimted list of files).

func (*FileListIterator) Close

func (it *FileListIterator) Close() error

Close performs any implementation specific tasks before terminating the iterator.

func (*FileListIterator) IsIterating

func (it *FileListIterator) IsIterating() bool

IsIterating() returns a boolean value indicating whether 'it' is still processing documents.

func (*FileListIterator) Iterate

func (it *FileListIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*Record, error]

Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.

func (*FileListIterator) Seen

func (it *FileListIterator) Seen() int64

Seen() returns the total number of records processed so far.

type GeoJSONLIterator

type GeoJSONLIterator struct {
	Iterator
	// contains filtered or unexported fields
}

GeoJSONLIterator implements the `Iterator` interface for crawling features in a line-separated GeoJSON record.

func (*GeoJSONLIterator) Close

func (it *GeoJSONLIterator) Close() error

Close performs any implementation specific tasks before terminating the iterator.

func (*GeoJSONLIterator) IsIterating

func (it *GeoJSONLIterator) IsIterating() bool

IsIterating() returns a boolean value indicating whether 'it' is still processing documents.

func (*GeoJSONLIterator) Iterate

func (it *GeoJSONLIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*Record, error]

Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.

func (*GeoJSONLIterator) Seen

func (it *GeoJSONLIterator) Seen() int64

Seen() returns the total number of records processed so far.

type Iterator

type Iterator interface {
	// Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in one or more URIs.
	Iterate(context.Context, ...string) iter.Seq2[*Record, error]
	// Seen() returns the total number of records processed so far.
	Seen() int64
	// IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
	IsIterating() bool
	// Close performs any implementation specific tasks before terminating the iterator.
	Close() error
}

Iterator defines an interface for iterating through collections of Who's On First documents.

func NewConcurrentIterator

func NewConcurrentIterator(ctx context.Context, iterator_uri string, it Iterator) (Iterator, error)

NewConcurrentIterator() returns a new `Iterator` instance derived from 'iterator_uri' and 'it'. The former is expected to be a valid `whosonfirst/go-whosonfirst-iterate/v3.Iterator` URI defined by the following parameters: * `?_max_procs=` Explicitly set the number maximum processes to use for iterating documents simultaneously. (Default is the value of `runtime.NumCPU()`.) * `?_exclude=` A valid regular expresion used to test and exclude (if matching) the paths of documents as they are iterated through. * `?_exclude_alt_files= A boolean value indicating whether Who's On First style "alternate geometry" file paths should be excluded. (Default is false.) * `?_include=` A valid regular expresion used to test and include (if matching) the paths of documents as they are iterated through. * `?_dedupe=` A boolean value to track and skip records (specifically their relative URI) that have already been processed. * `?_retry=` A boolean value indicating whether failed iterators should be retried. (Default is false.) * `?_max_attempts=` The number of times to retry a failed iterator. (Default is 1.) * `?_retry_after=` The number of seconds to wait before retrying a failed iterator. (Default is 10.) * `?_with_stats=` Boolean flag indicating whether stats should be logged. Default is true. * `?_stats_interval=` The number of seconds between stats logging events. Default is 60. * `?_stas_level=` The (slog/log) level at which stats are logged. Default is INFO. These parameters will be used to wrap and perform additional checks when iterating through documents using 'it'.

func NewCwdIterator

func NewCwdIterator(ctx context.Context, uri string) (Iterator, error)

NewDirectoryIterator() returns a new `CwdIterator` instance configured by 'uri' in the form of:

cwd://?{PARAMETERS}

Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.

func NewDirectoryIterator

func NewDirectoryIterator(ctx context.Context, uri string) (Iterator, error)

NewDirectoryIterator() returns a new `DirectoryIterator` instance configured by 'uri' in the form of:

directory://?{PARAMETERS}

Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.

func NewFSIterator

func NewFSIterator(ctx context.Context, uri string, iterator_fs fs.FS) (Iterator, error)

NewFSIterator() returns a new `FSIterator` instance (wrapped by the `concurrentIterator` implementation) configured by 'uri' for iterating over 'fs'. Where 'uri' takes the form of:

fs://?{PARAMETERS}

Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules. * `?processes=` An optional number assigning the maximum number of database rows that will be processed simultaneously. (Default is defined by `runtime.NumCPU()`.)

func NewFeatureCollectionIterator

func NewFeatureCollectionIterator(ctx context.Context, uri string) (Iterator, error)

NewFeatureCollectionIterator() returns a new `FeatureCollectionIterator` instance configured by 'uri' in the form of:

featurecollection://?{PARAMETERS}

Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.

func NewFileIterator

func NewFileIterator(ctx context.Context, uri string) (Iterator, error)

NewFileIterator() returns a new `FileIterator` instance configured by 'uri' in the form of:

file://?{PARAMETERS}

Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.

func NewFileListIterator

func NewFileListIterator(ctx context.Context, uri string) (Iterator, error)

NewFileListIterator() returns a new `FileListIterator` instance configured by 'uri' in the form of:

file://?{PARAMETERS}

Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.

func NewGeoJSONLIterator

func NewGeoJSONLIterator(ctx context.Context, uri string) (Iterator, error)

NewGeojsonLIterator() returns a new `GeojsonLIterator` instance configured by 'uri' in the form of:

geojsonl://?{PARAMETERS}

Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.

func NewIterator

func NewIterator(ctx context.Context, uri string) (Iterator, error)

NewIterator() returns a new `Iterator` instance derived from 'uri'. The semantics of and requirements for 'uri' as specific to the package implementing the interface.

func NewNullIterator

func NewNullIterator(ctx context.Context, uri string) (Iterator, error)

NewNullIterator() returns a new `NullIterator` instance configured by 'uri' in the form of:

null://

func NewRepoIterator

func NewRepoIterator(ctx context.Context, uri string) (Iterator, error)

NewDirectoryIterator() returns a new `RepoIterator` instance configured by 'uri' in the form of:

repo://?{PARAMETERS}

Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.

type IteratorInitializationFunc

type IteratorInitializationFunc func(ctx context.Context, uri string) (Iterator, error)

IteratorInitializationFunc is a function defined by individual iterator package and used to create an instance of that iterator

type NullIterator

type NullIterator struct {
	Iterator
}

NullIterator implements the `Iterator` interface for appearing to crawl records but not doing anything.

func (*NullIterator) Close

func (it *NullIterator) Close() error

Close performs any implementation specific tasks before terminating the iterator.

func (*NullIterator) IsIterator

func (it *NullIterator) IsIterator() bool

IsIterating() returns a boolean value indicating whether 'it' is still processing documents.

func (*NullIterator) Iterate

func (it *NullIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*Record, error]

Iterate() does nothing.

func (*NullIterator) Seen

func (it *NullIterator) Seen() int64

Seen() returns the total number of records processed so far.

type Record

type Record struct {
	// Path is the URI of the record. This will vary from one `whosonfirst/go-whosonfirst-iterate/v3.Iterator`
	// implementation to the next.
	Path string
	// Body is an `io.ReadSeekCloser` containing the body of the record.
	Body io.ReadSeekCloser
}

Record is a struct wrapping the details of records processed by a `whosonfirst/go-whosonfirst-iterate/v3.Iterator` instance.

func NewRecord

func NewRecord(path string, r io.ReadSeekCloser) *Record

NewRecord returns a new `Record` instance wrapping 'path' and 'r'.

type RepoIterator

type RepoIterator struct {
	Iterator
	// contains filtered or unexported fields
}

RepoIterator implements the `Iterator` interface for crawling records in a Who's On First style data directory.

func (*RepoIterator) Close

func (it *RepoIterator) Close() error

Close performs any implementation specific tasks before terminating the iterator.

func (*RepoIterator) IsIterating

func (it *RepoIterator) IsIterating() bool

IsIterating() returns a boolean value indicating whether 'it' is still processing documents.

func (*RepoIterator) Iterate

func (it *RepoIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*Record, error]

Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.

func (*RepoIterator) Seen

func (it *RepoIterator) Seen() int64

Seen() returns the total number of records processed so far.

Directories

Path Synopsis
app
cmd
count command
emit command
Packge filters defines interfaces for filtering documents which should be processed during an iteration.
Packge filters defines interfaces for filtering documents which should be processed during an iteration.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL