Documentation
¶
Overview ¶
Package chutils is a set of utilities designed to work with ClickHouse. The utilities are designed to facilitate import and export of data.
The chutils package facilitates these types of functions. The principal use cases are:
- file --> ClickHouse
- ClickHouse --> file
- ClickHouse --> ClickHouse
Why is use case 3 helpful?
- Automatic generation of the CREATE TABLE statement
- Data cleaning
- Renaming fields
- Adding fields that may be complex functions of the Input and/or use data from other Go variables.
The chutils package defines:
- An Input interface that reads data.
- A TableDef struct that specifies the structure of the input. Features include:
- The fields/types of a TableDef can be specified, or they can be imputed from the data.
- The corresponding CREATE TABLE statement can be built and issued.
- Checks of the range/values of fields as they are read.
- An Output interface that writes data.
- Concurrent execution of Input/Output interfaces
The file package implements Input and Output for text files. The sql package implements Input and Output for SQL.
These two packages can be mixed and matched for Input/Output.
Example uses ¶
Load a CSV to ClickHouse -- Option 1 (see Example in package file) a. Define a file Reader to point to the CSV. b. Use Init to create the TableDef and then Impute to determine the fields and types. c. Use the Create method of TableDef to create the ClickHouse table to populate. d. Run TableSpecs().Check() to verify the TableSpec is set up correctly. e. Define a file Writer that points a temporary file. f. Use chutils Export to create a temporary file that uses the Reader/Writer. g. Use the Writer Insert method to issue a command to clickhouse-client to load the temporary file.
Load a CSV to ClickHouse -- Option 2 (see Example in package sql). a. same as a, above. b. same as b, above. c. same as c, above. d. same as d, above. e. Define an SQL Writer that points to the table to populate. f. Use chutils Export to create a VALUES insert statement. g. Use the Writer Insert statement to execute the Insert.
Insert to a ClickHouse table from a ClickHouse query -- Option 1. a. Define an sql Reader to define the source query. b. Use Init to define the TableDef and Create to create the output table. c. Run TableSpec().Check() to make sure the TableSpec is set up correctly. d. Use Insert to execute the insert query. (Note, there is no data validation).
Insert to a ClickHouse table from a ClickHouse query -- Option 2. a. Same as a, above. b. Same as b, above. c. Run TableSpec().Check() to make sure the TableSpec is set up correctly. d. Define an sql Writer that points to the table to populate. e. Use chutils Export to create the VALUEs statement that is used to insert into the table. f. Use the Writer Insert statement to execute the Insert. (Note, there *is* data validation).
Index ¶
- Constants
- Variables
- func CommentColumn(table, column, comment string, conn *Connect) error
- func CommentFds(table string, fds map[int]*FieldDef, conn *Connect)
- func Concur(nWorker int, rdrs []Input, wrtrs []Output, after int) error
- func Export(rdr Input, wrtr Output, after int, ignore bool) error
- func GetSystemField(table, sysField, field string, conn *Connect) (fieldVal string, err error)
- func GetSystemFields(conn *Connect, sysField string, table ...string) (map[string]string, error)
- func ReadFS(defDir embed.FS, startDir string) (defs map[string]string, err error)
- func ReadOS(dir string) (defs map[string]string, err error)
- func StrToCol(inStr, sep string, skipFirst bool) (keyval.KeyVal, error)
- func Wrapper(e error, text string) error
- func WriteArray(el interface{}, char, sdelim string) (line []byte)
- func WriteElement(el interface{}, char, sdelim string) []byte
- type ChField
- type ChType
- type Connect
- type ConnectOpt
- type EngineType
- type ErrType
- type FieldDef
- type Input
- type LegalValues
- type OuterFunc
- type OuterFuncs
- type Output
- type Row
- type Status
- type TableDef
- func (td *TableDef) Check() error
- func (td *TableDef) Copy(noLegal bool) *TableDef
- func (td *TableDef) Create(conn *Connect, table string) error
- func (td *TableDef) FieldList() []string
- func (td *TableDef) Get(name string) (int, *FieldDef, error)
- func (td *TableDef) Impute(rdr Input, rowsToExamine int, tol float64) (err error)
- func (td *TableDef) Nest(nestName, firstField, lastField string) error
- type Valid
Constants ¶
const ( OuterArray = 0 + iota OuterLowCardinality OuterNullable )
Variables ¶
var ( DateMissing = time.Date(1970, 1, 2, 0, 0, 0, 0, time.UTC) IntMissing = -1 FloatMissing = -1.0 StringMissing = "!" )
Missing values used when the user does not supply them
var DateFormats = []string{"2006-01-02", "2006-1-2", "2006/01/02", "2006/1/2", "20060102", "01022006", "01/02/2006", "1/2/2006", "01-02-2006", "1-2-2006", "200601", "Jan 2 2006", "January 2 2006", "Jan 2, 2006", "January 2, 2006", time.RFC3339}
DateFormats are formats to try when guessing the field type in Impute()
Functions ¶
func CommentColumn ¶ added in v1.1.28
CommentColumn adds a comment to column 'column' of table 'table'
func CommentFds ¶ added in v1.1.28
CommentFds comments all the columns of table that are represented in fds. Errors are ignored.
func Export ¶
Export transfers the contents of rdr to wrtr. if after == 0 then issues wrtr.Insert when it is done if after > 0 then issues wrtr.Insert every 'after' lines (useful for sql Writers to prevent memory issues) if after < 0, then the output isn't Inserted (for testing) if ignore == true, the read errors are ignored
func GetSystemField ¶ added in v1.1.31
GetComment returns the comment for "field" from db.table
func GetSystemFields ¶ added in v1.1.31
GetComments returns a map of field comments from the slice of tables. The key is the field name and the value is the comment
func ReadFS ¶ added in v1.1.35
ReadFS reads the all the files in embedded directory defDir and returns the results as a map. The key to the map is the file name, the value is the contents of the file.
func ReadOS ¶ added in v1.1.35
ReadOS reads the all the files indirectory dir and returns the results as a map. The key to the map is the file name, the value is the contents of the file.
func StrToCol ¶ added in v1.1.35
StrToCol takes a string that's in key/val format and returns a Keyval object
func WriteArray ¶ added in v1.1.12
WriteArray writes a ClickHouse Array type. The format is "array(a1,a2,...)"
func WriteElement ¶ added in v1.1.12
WriteElement writes a single field with separator char. For strings, the text qualifier is sdelim. If sdelim is found, it is doubled.
Types ¶
type ChField ¶
type ChField struct { Base ChType // Base is base type of ClickHouse field. Length int // Length is length of field (0 for String). Funcs OuterFuncs // OuterFunc is the outer function applied to the field (e.g. LowCardinality(), Nullable()) Format string // Format for incoming dates from Input, or outgoing Floats }
ChField struct holds the specification for a ClickHouse field
func NewChFieldKV ¶ added in v1.1.35
NewCHFieldKV creates a new ChField based on the entries of kv
type ChType ¶
type ChType int
ChType enum is supported ClickHouse field types.
type Connect ¶
type Connect struct { Host string // Host is host IP User string // User is ClickHouse username Password string // Password is user's password *sql.DB // ClickHouse database connector // contains filtered or unexported fields }
Connect holds the ClickHouse connect information
func NewConnect ¶
func NewConnect(host, user, password string, settings clickhouse.Settings, opts ...ConnectOpt) (con *Connect, err error)
NewConnect established a new connection to ClickHouse. host is IP address (assumes port 9000), memory is max_memory_usage
type ConnectOpt ¶ added in v1.1.17
type ConnectOpt func(conn *Connect)
ConnectOpt is the type of function to set options for Connect
func WithTimeOut ¶ added in v1.1.17
func WithTimeOut(timeOutMinutes int64) ConnectOpt
type EngineType ¶
type EngineType int
EngineType enum specifies ClickHouse engine types
const ( MergeTree EngineType = 0 + iota Memory )
func (EngineType) String ¶
func (i EngineType) String() string
type FieldDef ¶
type FieldDef struct { Name string // Name of the field. ChSpec ChField // ChSpec is the Clickhouse specification of field. Description string // Description is an optional description for CREATE TABLE statement. Legal *LegalValues // Legal are optional bounds/list of legal values. Missing interface{} // Missing is the value used for a field if the value is illegal. Default interface{} // Default is the value used for a field if the field is empty Width int // Width of field (for flat files) Drop bool // Drop, if True, instructs Outputs to ignore this field }
FieldDef struct holds the full definition of single ClickHouse field.
func NewFieldDef ¶
func NewFieldDefKV ¶ added in v1.1.35
NewFieldDefKV builds a FieldDef based on the entries of kv
func (*FieldDef) CheckRange ¶
CheckRange checks whether checkVal is a legal value. Returns fd.Missing, if not.
type Input ¶
type Input interface { Read(nTarget int, validate bool) (data []Row, valid []Valid, err error) // Read from the Input, possibly with validation Reset() error // Reset to beginning of source CountLines() (numLines int, err error) // CountLines returns # of lines in source Seek(lineNo int) error // Seek moves to lineNo in source Close() error // Close the source TableSpec() *TableDef // TableSpec returns the TableDef for the source }
The Input interface specifies the requirements for reading source data.
type LegalValues ¶
type LegalValues struct { LowLimit interface{} // Minimum legal value for types Int, Float HighLimit interface{} // Maximum legal value for types Int, Float Levels []string // Legal values for types String, FixedString }
LegalValues holds bounds and lists of legal values for a ClickHouse field
func NewLegalKV ¶ added in v1.1.35
func NewLegalKV(kv keyval.KeyVal) *LegalValues
NewLegalKV creates a new *LegalValues based on the entries of kv
func NewLegalValues ¶
func NewLegalValues() *LegalValues
NewLegalValues creates a new LegalValues type
type OuterFuncs ¶ added in v0.1.7
type OuterFuncs []OuterFunc
OuterFuncs is a slice of OuterFunc since we can have multiple of these in a field
func (OuterFuncs) Has ¶ added in v0.1.7
func (o OuterFuncs) Has(target OuterFunc) bool
type Output ¶
type Output interface { Write(b []byte) (n int, err error) // Write byte array, n is # of bytes written. Writes do not go to ClickHouse (see Insert) Name() string // Name of output (file, table) Insert() error // Inserts into ClickHouse Separator() rune // Separator returns the field separator EOL() rune // EOL returns the End-of-line character Text() string // Text is text qualifier for strings. If it is found, Export doubles it. For CH, this is a single quote. Close() error // Close writer }
The Output interface specifies requirements for writing data.
type Row ¶
type Row []interface{}
Row is single row of the table. The fields may be of any type. A Row is stored in the same order of the TableDef FieldDefs slice.
type Status ¶
type Status int
Status enum is the validation status of a particular instance of a ChField field as judged against its ClickHouse type and acceptable values
type TableDef ¶
type TableDef struct { Key string // Key is the key for the table. Engine EngineType // EngineType is the ClickHouse table engine. FieldDefs map[int]*FieldDef // Map of fields in the table. The int key is the column order in the table. // contains filtered or unexported fields }
TableDef struct defines a table.
func NewTableDef ¶
func NewTableDef(key string, engine EngineType, fielddefs map[int]*FieldDef) *TableDef
NewTableDef creates a new TableDef struc
func NewTableDefKV ¶ added in v1.1.35
NewTableDefKV creates a *TableDef from a directory of files. Each file defines a field. The file consists of key/val pairs which define the field. The list of keys is below. The field name is the name of the file.
key - field name string is in keyval format. Valid Keys are:
type*. float, int, string, fixedString, date
order*. order of the field in the table, starting with 0.
key. Order of key: 0, 1, 2,....
length. Appropriate length for the field type, if applicable. Defaults to 32 for int, 64 for float. Required for fixed string.
low. Minimum legal value (float, int, date).
high. Maximum legal value (float, int, date).
levels. Legal levels (string, fixedString)
missing*. Value to use if the actual value is illegal.
default*. Value to use if the actual value is missing.
format. Date format for date fields.
description. Description of the field.
*required.
func (*TableDef) Check ¶
Check verifies that the fields are of the type chutils supports. Check also converts, if needed, the Legal.HighLimit and Legal.LowLimit and Missing interfaces to the correct type and checks LowLimit <= HighLimit. It's a good idea to run Check before reading the data.
func (*TableDef) Copy ¶ added in v1.1.18
Copy makes an independent (no shared memory) of the TableDef. if noLegal=true, then the LegalValues are not copied over (so there would be no validation check)
func (*TableDef) Create ¶
Create builds and issues CREATE TABLE ClickHouse statement. The table created is "table"
func (*TableDef) FieldList ¶ added in v1.1.18
FieldList returns a slice of field names in the same order they are in the data
func (*TableDef) Get ¶
Get returns the FieldDef for field "name". The FieldDefs map is by column order, so access by field name is needed.
Source Files
¶
Directories
¶
Path | Synopsis |
---|---|
Package file implements Input/Output for text files.
|
Package file implements Input/Output for text files. |
Package nested allows for additional calculations to be added to the output of a chutils.Input reader.
|
Package nested allows for additional calculations to be added to the output of a chutils.Input reader. |
Package sql implements Input/Output for SQL.
|
Package sql implements Input/Output for SQL. |
Package str implements the Input interface for strings by leveraging the file.Reader type.
|
Package str implements the Input interface for strings by leveraging the file.Reader type. |
Package to provides simple methods to write chutils.Input data to a CSV or ClickHouse
|
Package to provides simple methods to write chutils.Input data to a CSV or ClickHouse |