Documentation
¶
Overview ¶
Package iterator provides methods and utilities for iterating over a collection of records (presumed but not required to be Who's On First records) from a variety of sources and dispatching processing to user-defined callback functions.
Package iterate provides interfaces for iterating through a set of Who's On First documents.
Index ¶
- Constants
- func ApplyFilters(ctx context.Context, r io.ReadSeeker, f filters.Filters) (bool, error)
- func IteratorSchemes() []string
- func ReaderWithPath(ctx context.Context, abs_path string) (io.ReadSeekCloser, error)
- func RegisterIterator(ctx context.Context, scheme string, f IteratorInitializationFunc) error
- func ScrubURI(uri string) (string, error)
- type CwdIterator
- type DirectoryIterator
- type FSIterator
- type FeatureCollectionIterator
- type FileIterator
- type FileListIterator
- type GeoJSONLIterator
- type Iterator
- func NewConcurrentIterator(ctx context.Context, iterator_uri string, it Iterator) (Iterator, error)
- func NewCwdIterator(ctx context.Context, uri string) (Iterator, error)
- func NewDirectoryIterator(ctx context.Context, uri string) (Iterator, error)
- func NewFSIterator(ctx context.Context, uri string, iterator_fs fs.FS) (Iterator, error)
- func NewFeatureCollectionIterator(ctx context.Context, uri string) (Iterator, error)
- func NewFileIterator(ctx context.Context, uri string) (Iterator, error)
- func NewFileListIterator(ctx context.Context, uri string) (Iterator, error)
- func NewGeoJSONLIterator(ctx context.Context, uri string) (Iterator, error)
- func NewIterator(ctx context.Context, uri string) (Iterator, error)
- func NewNullIterator(ctx context.Context, uri string) (Iterator, error)
- func NewRepoIterator(ctx context.Context, uri string) (Iterator, error)
- type IteratorInitializationFunc
- type NullIterator
- type Record
- type RepoIterator
Constants ¶
const STDIN string = "STDIN"
STDIN is a constant value signaling that a record was read from `STDIN` and has no URI (path).
Variables ¶
This section is empty.
Functions ¶
func ApplyFilters ¶
ApplyFilters is a convenience methods to test whether 'r' matches all the filters defined by 'f' and also "rewinds" 'r' before returning.
func IteratorSchemes ¶
func IteratorSchemes() []string
IteratorSchemes() returns the list of schemes that have been "registered".
func ReaderWithPath ¶
ReaderWithPath returns a new `io.ReadSeekCloser` instance derived from 'abs_path'.
func RegisterIterator ¶
func RegisterIterator(ctx context.Context, scheme string, f IteratorInitializationFunc) error
RegisterIterator() associates 'scheme' with 'init_func' in an internal list of avilable `Iterator` implementations.
Types ¶
type CwdIterator ¶
type CwdIterator struct {
Iterator
// contains filtered or unexported fields
}
CwdIterator implements the `Iterator` interface for crawling records in a Who's On First style data directory.
func (*CwdIterator) Close ¶
func (it *CwdIterator) Close() error
Close performs any implementation specific tasks before terminating the iterator.
func (*CwdIterator) IsIterating ¶
func (it *CwdIterator) IsIterating() bool
IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
func (*CwdIterator) Iterate ¶
Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.
func (*CwdIterator) Seen ¶
func (it *CwdIterator) Seen() int64
Seen() returns the total number of records processed so far.
type DirectoryIterator ¶
type DirectoryIterator struct {
Iterator
// contains filtered or unexported fields
}
DirectoryIterator implements the `Iterator` interface for crawling records in a directory.
func (*DirectoryIterator) Close ¶
func (it *DirectoryIterator) Close() error
Close performs any implementation specific tasks before terminating the iterator.
func (*DirectoryIterator) IsIterating ¶
func (it *DirectoryIterator) IsIterating() bool
IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
func (*DirectoryIterator) Iterate ¶
Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.
func (*DirectoryIterator) Seen ¶
func (it *DirectoryIterator) Seen() int64
Seen() returns the total number of records processed so far.
type FSIterator ¶
type FSIterator struct {
Iterator
// contains filtered or unexported fields
}
FSIterator implements the `Iterator` interface for crawling records in a fs.
func (*FSIterator) Close ¶
func (it *FSIterator) Close() error
Close performs any implementation specific tasks before terminating the iterator.
func (*FSIterator) IsIterating ¶
func (it *FSIterator) IsIterating() bool
IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
func (*FSIterator) Iterate ¶
Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.
func (*FSIterator) Seen ¶
func (it *FSIterator) Seen() int64
Seen() returns the total number of records processed so far.
type FeatureCollectionIterator ¶
type FeatureCollectionIterator struct {
Iterator
// contains filtered or unexported fields
}
FeatureCollectionIterator implements the `Iterator` interface for crawling features in a GeoJSON FeatureCollection record.
func (*FeatureCollectionIterator) Close ¶
func (it *FeatureCollectionIterator) Close() error
Close performs any implementation specific tasks before terminating the iterator.
func (*FeatureCollectionIterator) IsIterating ¶
func (it *FeatureCollectionIterator) IsIterating() bool
IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
func (*FeatureCollectionIterator) Iterate ¶
func (it *FeatureCollectionIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*Record, error]
Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.
func (*FeatureCollectionIterator) Seen ¶
func (it *FeatureCollectionIterator) Seen() int64
Seen() returns the total number of records processed so far.
type FileIterator ¶
type FileIterator struct {
Iterator
// contains filtered or unexported fields
}
FileIterator implements the `Iterator` interface for crawling individual file records.
func (*FileIterator) Close ¶
func (it *FileIterator) Close() error
Close performs any implementation specific tasks before terminating the iterator.
func (*FileIterator) IsIterating ¶
func (it *FileIterator) IsIterating() bool
IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
func (*FileIterator) Iterate ¶
Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.
func (*FileIterator) Seen ¶
func (it *FileIterator) Seen() int64
Seen() returns the total number of records processed so far.
type FileListIterator ¶
type FileListIterator struct {
Iterator
// contains filtered or unexported fields
}
FileListIterator implements the `Iterator` interface for crawling records listed in a "file list" (a plain text newline-delimted list of files).
func (*FileListIterator) Close ¶
func (it *FileListIterator) Close() error
Close performs any implementation specific tasks before terminating the iterator.
func (*FileListIterator) IsIterating ¶
func (it *FileListIterator) IsIterating() bool
IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
func (*FileListIterator) Iterate ¶
Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.
func (*FileListIterator) Seen ¶
func (it *FileListIterator) Seen() int64
Seen() returns the total number of records processed so far.
type GeoJSONLIterator ¶
type GeoJSONLIterator struct {
Iterator
// contains filtered or unexported fields
}
GeoJSONLIterator implements the `Iterator` interface for crawling features in a line-separated GeoJSON record.
func (*GeoJSONLIterator) Close ¶
func (it *GeoJSONLIterator) Close() error
Close performs any implementation specific tasks before terminating the iterator.
func (*GeoJSONLIterator) IsIterating ¶
func (it *GeoJSONLIterator) IsIterating() bool
IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
func (*GeoJSONLIterator) Iterate ¶
Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.
func (*GeoJSONLIterator) Seen ¶
func (it *GeoJSONLIterator) Seen() int64
Seen() returns the total number of records processed so far.
type Iterator ¶
type Iterator interface {
// Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in one or more URIs.
Iterate(context.Context, ...string) iter.Seq2[*Record, error]
// Seen() returns the total number of records processed so far.
Seen() int64
// IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
IsIterating() bool
// Close performs any implementation specific tasks before terminating the iterator.
Close() error
}
Iterator defines an interface for iterating through collections of Who's On First documents.
func NewConcurrentIterator ¶
NewConcurrentIterator() returns a new `Iterator` instance derived from 'iterator_uri' and 'it'. The former is expected to be a valid `whosonfirst/go-whosonfirst-iterate/v3.Iterator` URI defined by the following parameters: * `?_max_procs=` Explicitly set the number maximum processes to use for iterating documents simultaneously. (Default is the value of `runtime.NumCPU()`.) * `?_exclude=` A valid regular expresion used to test and exclude (if matching) the paths of documents as they are iterated through. * `?_exclude_alt_files= A boolean value indicating whether Who's On First style "alternate geometry" file paths should be excluded. (Default is false.) * `?_include=` A valid regular expresion used to test and include (if matching) the paths of documents as they are iterated through. * `?_dedupe=` A boolean value to track and skip records (specifically their relative URI) that have already been processed. * `?_retry=` A boolean value indicating whether failed iterators should be retried. (Default is false.) * `?_max_attempts=` The number of times to retry a failed iterator. (Default is 1.) * `?_retry_after=` The number of seconds to wait before retrying a failed iterator. (Default is 10.) * `?_with_stats=` Boolean flag indicating whether stats should be logged. Default is true. * `?_stats_interval=` The number of seconds between stats logging events. Default is 60. * `?_stas_level=` The (slog/log) level at which stats are logged. Default is INFO. These parameters will be used to wrap and perform additional checks when iterating through documents using 'it'.
func NewCwdIterator ¶
NewDirectoryIterator() returns a new `CwdIterator` instance configured by 'uri' in the form of:
cwd://?{PARAMETERS}
Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.
func NewDirectoryIterator ¶
NewDirectoryIterator() returns a new `DirectoryIterator` instance configured by 'uri' in the form of:
directory://?{PARAMETERS}
Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.
func NewFSIterator ¶
NewFSIterator() returns a new `FSIterator` instance (wrapped by the `concurrentIterator` implementation) configured by 'uri' for iterating over 'fs'. Where 'uri' takes the form of:
fs://?{PARAMETERS}
Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules. * `?processes=` An optional number assigning the maximum number of database rows that will be processed simultaneously. (Default is defined by `runtime.NumCPU()`.)
func NewFeatureCollectionIterator ¶
NewFeatureCollectionIterator() returns a new `FeatureCollectionIterator` instance configured by 'uri' in the form of:
featurecollection://?{PARAMETERS}
Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.
func NewFileIterator ¶
NewFileIterator() returns a new `FileIterator` instance configured by 'uri' in the form of:
file://?{PARAMETERS}
Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.
func NewFileListIterator ¶
NewFileListIterator() returns a new `FileListIterator` instance configured by 'uri' in the form of:
file://?{PARAMETERS}
Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.
func NewGeoJSONLIterator ¶
NewGeojsonLIterator() returns a new `GeojsonLIterator` instance configured by 'uri' in the form of:
geojsonl://?{PARAMETERS}
Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.
func NewIterator ¶
NewIterator() returns a new `Iterator` instance derived from 'uri'. The semantics of and requirements for 'uri' as specific to the package implementing the interface.
func NewNullIterator ¶
NewNullIterator() returns a new `NullIterator` instance configured by 'uri' in the form of:
null://
func NewRepoIterator ¶
NewDirectoryIterator() returns a new `RepoIterator` instance configured by 'uri' in the form of:
repo://?{PARAMETERS}
Where {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules.
type IteratorInitializationFunc ¶
IteratorInitializationFunc is a function defined by individual iterator package and used to create an instance of that iterator
type NullIterator ¶
type NullIterator struct {
Iterator
}
NullIterator implements the `Iterator` interface for appearing to crawl records but not doing anything.
func (*NullIterator) Close ¶
func (it *NullIterator) Close() error
Close performs any implementation specific tasks before terminating the iterator.
func (*NullIterator) IsIterator ¶
func (it *NullIterator) IsIterator() bool
IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
func (*NullIterator) Seen ¶
func (it *NullIterator) Seen() int64
Seen() returns the total number of records processed so far.
type Record ¶
type Record struct {
// Path is the URI of the record. This will vary from one `whosonfirst/go-whosonfirst-iterate/v3.Iterator`
// implementation to the next.
Path string
// Body is an `io.ReadSeekCloser` containing the body of the record.
Body io.ReadSeekCloser
}
Record is a struct wrapping the details of records processed by a `whosonfirst/go-whosonfirst-iterate/v3.Iterator` instance.
type RepoIterator ¶
type RepoIterator struct {
Iterator
// contains filtered or unexported fields
}
RepoIterator implements the `Iterator` interface for crawling records in a Who's On First style data directory.
func (*RepoIterator) Close ¶
func (it *RepoIterator) Close() error
Close performs any implementation specific tasks before terminating the iterator.
func (*RepoIterator) IsIterating ¶
func (it *RepoIterator) IsIterating() bool
IsIterating() returns a boolean value indicating whether 'it' is still processing documents.
func (*RepoIterator) Iterate ¶
Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.
func (*RepoIterator) Seen ¶
func (it *RepoIterator) Seen() int64
Seen() returns the total number of records processed so far.