database

package

v0.1.3 Latest Latest Go to latest Published: Sep 10, 2021 License: Apache-2.0 Imports: 23 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kokes/smda

Links

Open Source Insights

Documentation ¶

Index ¶

func CacheIncomingFile(r io.Reader, path string) error
type Config
type Database
- func NewDatabase(wdir string, overrides *Config) (*Database, error)
type Dataset
- func NewDataset(name string) *Dataset
type ObjectType
type RowReader
- func NewRowReader(r io.Reader, settings *loadSettings) (RowReader, error)
type Stripe
type StripeReader
- func NewStripeReader(db *Database, ds *Dataset, stripe Stripe) (*StripeReader, error)
- func (sr *StripeReader) Close() error
- func (sr *StripeReader) ReadColumn(nthColumn int) (column.Chunk, error)
type UID
- func UIDFromHex(data []byte) (UID, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func CacheIncomingFile ¶

func CacheIncomingFile(r io.Reader, path string) error

CacheIncomingFile saves data from a given reader to a file

Types ¶

type Config ¶

type Config struct {
	WorkingDirectory  string `json:"-"` // not exposing this in our json representation as the db can be moved around
	CreatedTimestamp  int64  `json:"created_timestamp"`
	DatabaseID        UID    `json:"database_id"`
	MaxRowsPerStripe  int    `json:"max_rows_per_stripe"`
	MaxBytesPerStripe int    `json:"max_bytes_per_stripe"`
}

Config sets some high level properties for a new Database. It's useful for testing or for passing settings based on cli flags.

type Database ¶

type Database struct {
	sync.Mutex
	Datasets    []*Dataset
	ServerHTTP  *http.Server
	ServerHTTPS *http.Server
	Config      *Config
}

Database is the main struct that contains it all - notably the datasets' metadata and the webserver Having the webserver here makes it convenient for testing - we can spawn new servers at a moment's notice

func NewDatabase ¶

func NewDatabase(wdir string, overrides *Config) (*Database, error)

NewDatabase initiates a new database object and binds it to a given directory. If the directory doesn't exist, it creates it. If it exists, it loads the data contained within.

func (*Database) AddDataset ¶

func (db *Database) AddDataset(ds *Dataset) error

AddDataset adds a Dataset to a Database this is a pretty rare event, so we don't expect much contention it's just to avoid some issues when marshaling the object around in the API etc.

func (*Database) DatasetPath ¶

func (db *Database) DatasetPath(ds *Dataset) string

DatasetPath returns the path of a given dataset (all the stripes are there) ARCH: consider merging this with dataPath based on a nullable dataset argument (like manifestPath)

func (*Database) Drop ¶

func (db *Database) Drop() error

Drop deletes all local data for a given Database

func (*Database) GetDataset ¶

func (db *Database) GetDataset(name, version string, latest bool) (*Dataset, error)

func (*Database) GetDatasetByVersion ¶ added in v0.1.3

func (db *Database) GetDatasetByVersion(name, version string) (*Dataset, error)

GetDataset retrieves a dataset based on its UID OPTIM: not efficient in this implementation, but we don't have a map-like structure to store our datasets - we keep them in a slice, so that we have predictable order -> we need a sorted map

func (*Database) GetDatasetLatest ¶ added in v0.1.3

func (db *Database) GetDatasetLatest(name string) (*Dataset, error)

func (*Database) LoadDatasetFromMap ¶

func (db *Database) LoadDatasetFromMap(name string, data map[string][]string) (*Dataset, error)

LoadDatasetFromMap allows for an easy setup of a new dataset, mostly useful for tests Converts this map into an in-memory CSV file and passes it to our usual routines OPTIM: the underlying call (LoadDatasetFromReaderAuto) caches this raw data on disk, may be unecessary

func (*Database) LoadDatasetFromReaderAuto ¶

func (db *Database) LoadDatasetFromReaderAuto(name string, r io.Reader) (*Dataset, error)

LoadDatasetFromReaderAuto loads data from a reader and returns a Dataset

func (*Database) LoadSampleData ¶

func (db *Database) LoadSampleData(sampleDir fs.FS) error

LoadSampleData reads all CSVs from a given directory and loads them up into the database using default settings

func (*Database) ReadColumnsFromStripeByNames ¶

func (db *Database) ReadColumnsFromStripeByNames(ds *Dataset, stripe Stripe, columns []string) (map[string]column.Chunk, int, error)

OPTIM: perhaps reorder the column requests, so that they are contiguous, or at least in order

also add a benchmark that reads columns in reverse and see if we get any benefits from this

type Dataset ¶

type Dataset struct {
	ID   UID    `json:"id"`
	Name string `json:"name"`
	// ARCH: move the next three to a a `Meta` struct?
	Created int64 `json:"created_timestamp"`
	NRows   int64 `json:"nrows"`
	// ARCH: note that we'd ideally get this as the uncompressed size... might be tricky to get
	SizeRaw    int64 `json:"size_raw"`
	SizeOnDisk int64 `json:"size_on_disk"`

	Schema column.TableSchema `json:"schema"`
	// TODO/OPTIM: we need the following for manifests, but it's unnecessary for writing in our
	// web requests - remove it from there
	Stripes []Stripe `json:"stripes"`
}

Dataset contains metadata for a given dataset, which at this point means a table

func NewDataset ¶

func NewDataset(name string) *Dataset

NewDataset creates a new empty dataset

type ObjectType ¶

type ObjectType uint8

ObjectType denotes what type an object is (or its ID) - dataset, stripe etc.

const (
	OtypeNone ObjectType = iota
	OtypeDatabase
	OtypeDataset
	OtypeStripe
)

object types are reflected in the UID - the first two hex characters define this object type, so it's clear what sort of object you're dealing with based on its prefix

type RowReader ¶

type RowReader interface {
	// ARCH: consider making it a ([]byte, error) as soon as we
	//       1) have csv.NewReader with []byte support
	// 		 2) have strconv.ParseXXX with []byte support (will come with generics?)
	ReadRow() ([]string, error)
}

func NewRowReader ¶

func NewRowReader(r io.Reader, settings *loadSettings) (RowReader, error)

NewRowReader creates a new RowReader based on loadSettings passed in - e.g. if there's a delimiter specified, it will likely create a csvReader etc.

type Stripe ¶

type Stripe struct {
	Id      UID      `json:"id"`
	Length  int      `json:"length"`
	Offsets []uint32 `json:"offsets"`
}

Stripe only contains metadata about a given stripe, it has to be loaded separately to obtain actual data

type StripeReader ¶

type StripeReader struct {
	// contains filtered or unexported fields
}

func NewStripeReader ¶

func NewStripeReader(db *Database, ds *Dataset, stripe Stripe) (*StripeReader, error)

OPTIM: pass in a bytes buffer to reuse it?

func (*StripeReader) Close ¶

func (sr *StripeReader) Close() error

func (*StripeReader) ReadColumn ¶

func (sr *StripeReader) ReadColumn(nthColumn int) (column.Chunk, error)

type UID ¶

type UID struct {
	Otype ObjectType
	// contains filtered or unexported fields
}

UID is a unique ID for a given object, it's NOT a uuid

func UIDFromHex ¶

func UIDFromHex(data []byte) (UID, error)

ARCH: test this instead the Unmarshal? Or both?

func (UID) MarshalJSON ¶

func (uid UID) MarshalJSON() ([]byte, error)

MarshalJSON satisfies the Marshaler interface, so that we can automatically marshal UIDs as JSON

func (UID) String ¶

func (uid UID) String() string

func (*UID) UnmarshalJSON ¶

func (uid *UID) UnmarshalJSON(data []byte) error

UnmarshalJSON satisfies the Unmarshaler interface (we need a pointer here, because we'll be writing to it)

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL