Documentation
¶
Index ¶
- func CacheIncomingFile(r io.Reader, path string) error
- type Config
- type Database
- func (db *Database) AddDataset(ds *Dataset) error
- func (db *Database) DatasetPath(ds *Dataset) string
- func (db *Database) Drop() error
- func (db *Database) GetDataset(name, version string, latest bool) (*Dataset, error)
- func (db *Database) GetDatasetByVersion(name, version string) (*Dataset, error)
- func (db *Database) GetDatasetLatest(name string) (*Dataset, error)
- func (db *Database) LoadDatasetFromMap(name string, data map[string][]string) (*Dataset, error)
- func (db *Database) LoadDatasetFromReaderAuto(name string, r io.Reader) (*Dataset, error)
- func (db *Database) LoadSampleData(sampleDir fs.FS) error
- func (db *Database) ReadColumnsFromStripeByNames(ds *Dataset, stripe Stripe, columns []string) (map[string]column.Chunk, int, error)
- type Dataset
- type ObjectType
- type RowReader
- type Stripe
- type StripeReader
- type UID
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type Config ¶
type Config struct { WorkingDirectory string `json:"-"` // not exposing this in our json representation as the db can be moved around CreatedTimestamp int64 `json:"created_timestamp"` DatabaseID UID `json:"database_id"` MaxRowsPerStripe int `json:"max_rows_per_stripe"` MaxBytesPerStripe int `json:"max_bytes_per_stripe"` }
Config sets some high level properties for a new Database. It's useful for testing or for passing settings based on cli flags.
type Database ¶
type Database struct { sync.Mutex Datasets []*Dataset ServerHTTP *http.Server ServerHTTPS *http.Server Config *Config }
Database is the main struct that contains it all - notably the datasets' metadata and the webserver Having the webserver here makes it convenient for testing - we can spawn new servers at a moment's notice
func NewDatabase ¶
NewDatabase initiates a new database object and binds it to a given directory. If the directory doesn't exist, it creates it. If it exists, it loads the data contained within.
func (*Database) AddDataset ¶
AddDataset adds a Dataset to a Database this is a pretty rare event, so we don't expect much contention it's just to avoid some issues when marshaling the object around in the API etc.
func (*Database) DatasetPath ¶
DatasetPath returns the path of a given dataset (all the stripes are there) ARCH: consider merging this with dataPath based on a nullable dataset argument (like manifestPath)
func (*Database) GetDataset ¶
func (*Database) GetDatasetByVersion ¶ added in v0.1.3
GetDataset retrieves a dataset based on its UID OPTIM: not efficient in this implementation, but we don't have a map-like structure to store our datasets - we keep them in a slice, so that we have predictable order -> we need a sorted map
func (*Database) GetDatasetLatest ¶ added in v0.1.3
func (*Database) LoadDatasetFromMap ¶
LoadDatasetFromMap allows for an easy setup of a new dataset, mostly useful for tests Converts this map into an in-memory CSV file and passes it to our usual routines OPTIM: the underlying call (LoadDatasetFromReaderAuto) caches this raw data on disk, may be unecessary
func (*Database) LoadDatasetFromReaderAuto ¶
LoadDatasetFromReaderAuto loads data from a reader and returns a Dataset
func (*Database) LoadSampleData ¶
LoadSampleData reads all CSVs from a given directory and loads them up into the database using default settings
func (*Database) ReadColumnsFromStripeByNames ¶
func (db *Database) ReadColumnsFromStripeByNames(ds *Dataset, stripe Stripe, columns []string) (map[string]column.Chunk, int, error)
OPTIM: perhaps reorder the column requests, so that they are contiguous, or at least in order
also add a benchmark that reads columns in reverse and see if we get any benefits from this
type Dataset ¶
type Dataset struct { ID UID `json:"id"` Name string `json:"name"` // ARCH: move the next three to a a `Meta` struct? Created int64 `json:"created_timestamp"` NRows int64 `json:"nrows"` // ARCH: note that we'd ideally get this as the uncompressed size... might be tricky to get SizeRaw int64 `json:"size_raw"` SizeOnDisk int64 `json:"size_on_disk"` Schema column.TableSchema `json:"schema"` // TODO/OPTIM: we need the following for manifests, but it's unnecessary for writing in our // web requests - remove it from there Stripes []Stripe `json:"stripes"` }
Dataset contains metadata for a given dataset, which at this point means a table
type ObjectType ¶
type ObjectType uint8
ObjectType denotes what type an object is (or its ID) - dataset, stripe etc.
const ( OtypeNone ObjectType = iota OtypeDatabase OtypeDataset OtypeStripe )
object types are reflected in the UID - the first two hex characters define this object type, so it's clear what sort of object you're dealing with based on its prefix
type RowReader ¶
type Stripe ¶
type Stripe struct { Id UID `json:"id"` Length int `json:"length"` Offsets []uint32 `json:"offsets"` }
Stripe only contains metadata about a given stripe, it has to be loaded separately to obtain actual data
type StripeReader ¶
type StripeReader struct {
// contains filtered or unexported fields
}
func NewStripeReader ¶
func NewStripeReader(db *Database, ds *Dataset, stripe Stripe) (*StripeReader, error)
OPTIM: pass in a bytes buffer to reuse it?
func (*StripeReader) Close ¶
func (sr *StripeReader) Close() error
func (*StripeReader) ReadColumn ¶
func (sr *StripeReader) ReadColumn(nthColumn int) (column.Chunk, error)
type UID ¶
type UID struct { Otype ObjectType // contains filtered or unexported fields }
UID is a unique ID for a given object, it's NOT a uuid
func UIDFromHex ¶
ARCH: test this instead the Unmarshal? Or both?
func (UID) MarshalJSON ¶
MarshalJSON satisfies the Marshaler interface, so that we can automatically marshal UIDs as JSON
func (*UID) UnmarshalJSON ¶
UnmarshalJSON satisfies the Unmarshaler interface (we need a pointer here, because we'll be writing to it)