mobsql

package module
v0.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 24, 2024 License: GPL-2.0 Imports: 7 Imported by: 4

README

Mobsql

builds.sr.ht status

Mobsql is a Go library and commandline application which facilitates loading one or multiple GTFS feeds into a SQLite database. Either user-provided GTFS ZIP feed archives or Mobility Database feeds (as specified by Mobility Database feed IDs) may be loaded. Mobsql's internal SQLite schema mirrors GTFS's spec but adds a feed_id field to each table (thus allowing multiple feeds to be loaded to the database simulatenously).

While primarily developed to be used by Mobroute, the general purpose GTFS router and related project, Mobsql itself is a fully independent tool and can be used as a standalone general-purpose GTFS-to-SQLite ETL & import utility (either via its CLI or as a Go library).

Documentation:

Project Overview / Featureset:

  • Imports GTFS ZIP archives into a local SQLite database with an additional 'feed_id' field in all GTFS tables; thus allowing multiple feeds to be stored without conflict.
  • Supports loading local filesystem user-specified GTFS ZIP archives, remote HTTP user-specified GTFS ZIP archives, and Mobility Database sourced ZIP archives.
  • Supports bulk import (e.g. 1-insert-for-multiple rows) functionality to decrease load time.
  • Utilizes SQLite's CSV Virtual table extension for optimized and fast imports of raw CSV GTFS data to SQLite DB.
  • Supports a caching system, storing a checksum of the imported GTFS table(s) such that successive imports on the same data nops effectively if there would be no net change.
  • In addition to downloading & GTFS loading functionality, can be used to compute contrived data (similar to materialized views), purge, and query status information about feeds.
  • Allows both searching and import of GTFS feeds by MDBID (from the Mobility Database) to import rather then making user source their own custom GTFS ZIP archives.
  • Simply models the database schema including: GTFS specification import rules, Mobility Database catalog (CSV) imported as table, internal tracking tables, and SQLite views in a single file - see schema datastructure.
  • Implements conversion logic building atop the GTFS schema for fields which can be stored more efficiently (such as stop_times's departure_time/arrival_time as integers) rather then as colon time strings.
  • Allows the creation of SQL views (currently used for internal tracking).
  • Implements automatic index creation based on schema specification.
  • Implements automatic creation of computed tables based on view logic (e.g. similar to the concept of materialized views).
  • Implements cleanup logic to gracefully handle partial / interrupted loads (e.g. killing process during GTFS zip extraction & SQL import / operations).

Documentation

Overview

Package mobsql is a library which offers functionality for downloading and loading one or multiple GTFS archives (pulled from either the Mobility Database catalog or specified by the user) into a SQLite database.

The SQLite database (seeded through mobsql) mirrors GTFS's specification (e.g. there is a transfers, stop_times, stops, agency tables etc.); however each GTFS table also has an additional 'feed_id' column. The feed_id column refers to Mobility Database mdb_id field (or in the case of a custom feed ID is a negative user-provided integer). Besides that, GTFS schema is imported as 1-to-1 for GTFS schedule specification sans several exceptions (such as stop_times conversion of departure_time to int). Exceptions, GTFS config, and internal tables can be understood by examining the config git.sr.ht/~mil/mobsql/schema.Schema.

Other then GTFS import functionality from the Mobility Database & custom GTFS feeds, the library itself offers a few auxiliary key functionalities:

  • Ability to update feeds after import
  • Ability to remove added feeds from the database
  • Ability to compute GTFS-derived table rows (per GTFS-feed)

Importantly, the end user of Mobsql should note, the only package endconsumers of this library should import and use directly is the top level git.sr.ht/~mil/mobsql package; everything is aliased from there. Subpackages are internal implementations & may change between versions.

Building

Ensure sqlite-dev is installed and Go >1.21

Also note: when building you must pass `-tags=sqlite_math_functions` to the `go build` command as this enables the underlying sqlite library to utilize math functions which Mobsql's core logic depends on. Example for build with tags below.

Example:

go build -tags=sqlite_math_functions foo.go

Overview of Functions

There are 7 public functions exposed from the library; and understanding each is equivalent to grokking the library as a whole. The 7 functions are:

  • InitializeRuntime: Initializes a Mobsql 'runtime' which represents a connection to the SQLite DB & configuration params which all other library operations depend on.
  • FeedsearchFilterToFeedIDs: Search the internal database for feeds matching search filter specification. Good for looking up Mobility Database feed IDs.
  • FeedLoadMDBGTFS: Load a GTFS feed from the Mobility Database.
  • FeedLoadCustomGTFS: Load a GTFS feed from a local ZIP archive or a remote GTFS ZIP archive.
  • FeedCompute: Compute custom tables as specified by SchemaExtra in the RuntimeConfig for the given feed IDs.
  • FeedPurge: Remove the given feed IDs from the database.
  • FeedStatus: Query metadata, loaded, and computed status for the given feed IDs (a RO operation).

The general usage pattern for using Mobsql as an library is to:

See below examples for usages or the `cli` package which implement all major functionality of the library itself.

Standalone (Runnable) Examples

Within the `examples/` folder there are a number of standalone runnable examples. You can run each of these examples by using go run; note to make sure to specify the sqlite_math_functions build tag.

go run -tags=sqlite_math_functions examples/loadmdbgtfs/main.go

Currently implemented examples are as follows:

An easy way to get started with library usage is to simply run & adapt these examples to your needs.

Index

Constants

View Source
const DTypeEnum = apptypes.DTypeEnum

DTypeEnum represents an enum in SQLLite (currently stored as type int)

View Source
const DTypeID = apptypes.DTypeID

DTypeID represents an ID in SQLite, (currently correlates to type `text` however this may change in the future when a more applicable ID type is found).

View Source
const DTypeInt = apptypes.DTypeInt

DTypeInt represents an integer in SQLite (correlates to type `int`)

View Source
const DTypeReal = apptypes.DTypeReal

DTypeReal represents an real in SQLite (correlates to type `real`)

View Source
const DTypeText = apptypes.DTypeText

DTypeText represents text in SQLite (correlates to type `text`)

View Source
const PurgeTablesOptionAll = apipurge.PurgeTablesOptionAll

PurgeTablesOptionAll purges all tables

View Source
const PurgeTablesOptionComputed = apipurge.PurgeTablesOptionComputed

PurgeTablesOptionGTFS purges user-defined computed tables

View Source
const PurgeTablesOptionGTFS = apipurge.PurgeTablesOptionGTFS

PurgeTablesOptionGTFS purges GTFS-related tables

Variables

This section is empty.

Functions

func FeedsearchFilterToFeedIDs added in v0.6.0

func FeedsearchFilterToFeedIDs(m *MobsqlRuntime, filter *FeedsearchFilter) ([]int, error)

FeedsearchFilterToFeedIDs translates the input search filter into an array of (Mobility Database and custom) feed ids.

This functionality is very helpful for quickly 'searching through' the Mobility Database for feed IDs which can be loaded and worked with via Mobsql.

Similar functionality is ofcourse available on the Mobility Database Website, however this functionality uses the local Mobility Database CSV and thus is ensured to be in-sync with the loaded GTFS feeds etc. As such, you may use this functionality to build out a 'feedsearch' functionality or similar in end-user applications.

Types

type ComputedTable added in v0.4.0

type ComputedTable = apptypes.ComputedTable

ComputedTable represents a computed table specification

type DType added in v0.9.0

type DType = apptypes.DType

DType represents a type corresponding to an underlying SQLite type

type FeedOpResult added in v0.6.0

type FeedOpResult = apptypes.FeedOpResult

FeedOpResult represents the result of a compute, purge, or loadmdbgtfs, or loadcustomgtfs request. This type exposes whether the operation was successful and the effected feedIDs.

In the future this type may be augmented with additional metadata related to the response status.

func FeedCompute added in v0.6.0

func FeedCompute(m *MobsqlRuntime, feedIDs []int) (*FeedOpResult, error)

FeedCompute recomputes the 'computed' table specifications as defined by SchemaExtra in runtime initialization.

A computed table is defined by ComputedTable and similar to the concept of a materialized view however works on a per-GTFS-feed level. The overall idea behind computed tables is that end-library users can defined specific GTFS-derived tables (and then 'recompute') these tables per GTFS on new data load. The GTFS-derived computed tables always map directly to a SQL select statement (and optionally pre/post init/deinit exec SQL logic).

An example of computed tables usage is within Mobroute, for example the Connection-Scan-Algorithm master 'connections' table (_ctconn) is defined as a view through a ComputedTable spec.

func FeedLoadCustomGTFS added in v0.7.0

func FeedLoadCustomGTFS(m *MobsqlRuntime, feedID int, feedURI string) (*FeedOpResult, error)

FeedLoadCustomGTFS loads the given feedID to the database for the feedURI allowing the loading of any arbitrary user-provided GTFS file to the SQLite database.

The feedID MUST be a negative integer as this allows disambiguation from Mobility Database feed IDs which are always positive integers. And the feedURI parameter may be either a file:// URI OR a http(s):// URI.

If you want to load a MobilityDatabase GTFS feed you may used the related FeedLoadMDBGTFS function in which case you don't have to specify the URI etc.

Note: Unlike the FeedLoadMDBGTFS function, an implicit FeedPurge operation is always run on the provided feed ID prior to load as this function is intended to be a one-time process and thus avoids the caching system

func FeedLoadMDBGTFS added in v0.9.0

func FeedLoadMDBGTFS(m *MobsqlRuntime, feedIDs []int, agencyDirectURL bool, update bool) (*FeedOpResult, error)

FeedLoadMDBGTFS loads the given Mobility Database feedIDs to the database. This involves both downloading the associated feed's GTFS ZIP archive (as specified by the Mobility Database) and then loading the feed via SQLite.

See Feed IDs doc for information on valid Mobility Database feed IDs that may be used. Also note that FeedsearchFilterToFeedIDs may be used to search for valid feeds.

If the update flag is set to true, the Mobility Database is force downloaded prior to loading & additionally the filecache & shacaching mechanisms are wholesale disabled. This is applicable in the case of updating feeds if you want to ensure that the loaded feed is the *latest* available data (otherwise in default scenario, the Mobility Database & GTFS data is cached with a 1-week expiry so if you perform subsequent load operations things are cached for a week out).

If the agencyDirectURL flag is set the 'direct' agency URL is utilized in load (meanwhile by default / if false), Mobility Database's CI bucket URLs are used (which may be slightly out of date as these are periodically updated but are vastly more reliable then the numerous agencies upstream URLs).

If you want to load a *custom* (non Mobility-Database) GTFS archive you may used the related FeedLoadCustomGTFS function.

func FeedPurge added in v0.6.0

func FeedPurge(m *MobsqlRuntime, feedIDs []int, purgeTablesOption PurgeTablesOption) (*FeedOpResult, error)

FeedPurge removes from the database the given feedIDs. There are two different types of tables that can be cleared as determined by the PurgeTablesOption.

If PurgeTablesOption is set to PurgeTablesOptionGTFS then only GTFS tables will be cleared. Alternatively PurgeTablesOptionComputed clears computed tables. And PurgeTablesOptionAll clears both GTFS & computed tables for the feed IDs.

type FeedStatusInfo added in v0.6.0

type FeedStatusInfo = apistatus.FeedStatusInfo

FeedStatusInfo represents the status of a particular GTFS feed from the MobilityDatabase. This is returned as a query mechanism from the FeedStatus function and can determine various metadata about the feed both from the Mobility DB's CSV (as imported to the local DB) in addition to interal mobsql-specific metadata (such as if the feed has been loaded, computed, etc.)

func FeedStatus added in v0.6.0

func FeedStatus(m *MobsqlRuntime, feedIDs []int) ([]FeedStatusInfo, error)

FeedStatus queries the database for the given feedIDs from the internal database and returns an array of FeedStatusInfo. See the underlying FeedStatusInfo for information available which includes metadata from the Mobility Database in addition to loaded and computed status for the given feeds.

type FeedsearchFilter added in v0.6.0

type FeedsearchFilter = apifeedsearch.FeedsearchFilter

FeedsearchFilter represents a 'filter' that ultimately acts as a search mechanism to determine feed IDs to use. This is a flexibly way of querying the MobilityDatabase's catalog for potential feed ids (MDBIDs) to use.

type LoadColumn added in v0.4.0

type LoadColumn = apptypes.LoadColumn

LoadColumn represents a single column to load for a TableSpec.

type LoadSchema added in v0.9.0

type LoadSchema = apptypes.LoadSchema

LoadSchema holds the core 'database schema' for the entire SQLite DB. This tracks both GTFS tables in TablesGTFS, a tracking table, MDB & Custom source tracking tables, and system views.

type LoadTable added in v0.9.0

type LoadTable = apptypes.LoadTable

LoadTable represents a table to be loaded per TableSpec to SQLite backed by an associated file.

type MobsqlRuntime added in v0.4.0

type MobsqlRuntime = apptypes.MobsqlRuntime

MobsqlRuntime holds the database connection and some global configuration properties. All public functionality in the library depends on this struct and it may be created using InitializeRuntime.

func InitializeRuntime added in v0.4.0

func InitializeRuntime(runtimeConfig *RuntimeConfig) (*MobsqlRuntime, error)

InitializeRuntime must be called before all other functions for mobsql and it creates the MobsqlRuntime. This holds the database connection and calling this function ensures some bootstrapping DB seeding etc. occurs.

Internally this function does 4 primary things:

  1. Opens the SQLite database - setting pragmas etc according
  2. Runs schema creation logic (e.g. create tables)
  3. Fetches from HTTP the Mobility Database CSV file & imports to _mdb table (cached)
  4. Returns the prepared, returning MobsqlRuntime{} which all other APIs operate based on (which contains the opened DB)

type PurgeTablesOption added in v0.5.0

type PurgeTablesOption = apipurge.PurgeTablesOption

PurgeTablesOption enumerates the options for purging used by FeedPurge

type RuntimeConfig added in v0.4.0

type RuntimeConfig = apptypes.RuntimeConfig

RuntimeConfig contains the configuration for the library usage used in creating a MobsqlRuntime by InitializeRuntime. These are global properties defined one-off upon initialization.

All properties if unset will just use defaults and specifying values in this struct is an advanced-usage pattern.

type SchemaExtra added in v0.4.0

type SchemaExtra = apptypes.SchemaExtra

SchemaExtra represents extra arbitrary SQL views & computed tables that may build upon the feed GTFS tables.

type TableSpec added in v0.4.0

type TableSpec = apptypes.TableSpec

TableSpec specifies a table to be created in SQLite with the columns indexes, and unique specs as indicated.

type View added in v0.4.0

type View = apptypes.View

View represents a SQLite view as defined by the SQLCreateStatement. The SQLCreateStatement should be in the form of a select query - for example `select 1` would be valid. From application logic this is always created via `create view foo as {SQLCREATESTATEMENT}`; so note do not add the create view bit yourself.

Directories

Path Synopsis
api
apicompute
Package apicompute contains internal implementation for FeedCompute
Package apicompute contains internal implementation for FeedCompute
apifeedsearch
Package apicompute contains internal implementation for FeedsearchFilterToFeedIDs
Package apicompute contains internal implementation for FeedsearchFilterToFeedIDs
apiinit
Package apiinit contains internal implementation for InitializeRuntime
Package apiinit contains internal implementation for InitializeRuntime
apiload
Package apiload contains internal implementation for FeedLoad & FeedLoadCustomGTFS
Package apiload contains internal implementation for FeedLoad & FeedLoadCustomGTFS
apipurge
Package apipurge contains internal implementation for FeedPurge
Package apipurge contains internal implementation for FeedPurge
apistatus
Package apipurge contains internal implementation for FeedStatus
Package apipurge contains internal implementation for FeedStatus
apitableimport
Package apitableimport contains internal implementation for ImportTableFromSpec
Package apitableimport contains internal implementation for ImportTableFromSpec
Package apptypes contains internal types used by the application across other packages.
Package apptypes contains internal types used by the application across other packages.
Package main is the CLI interface for mobsql consuming the mobsql library.
Package main is the CLI interface for mobsql consuming the mobsql library.
examples
computedtables
Mobsql - Computed Tables Example:
Mobsql - Computed Tables Example:
feedsearch
Mobsql - Feed Search Example:
Mobsql - Feed Search Example:
loadcustomgtfs
Mobsql - Custom GTFS Load Example:
Mobsql - Custom GTFS Load Example:
loadmdbgtfs
Mobsql - Mobility DB GTFS Load Example:
Mobsql - Mobility DB GTFS Load Example:
queryfeedstatus
Mobsql - Query Feed Status Example
Mobsql - Query Feed Status Example
Package config contains the data structures used to represent the database schema for seeding and loading
Package config contains the data structures used to represent the database schema for seeding and loading
Package sqlitecsv contains internal implementation which compiles the SQLite CSV virtualtable via CGO
Package sqlitecsv contains internal implementation which compiles the SQLite CSV virtualtable via CGO
util
utildb
Package utildb contains internal DB-related helper functions
Package utildb contains internal DB-related helper functions
utilfiles
Package utildownload contains internal helper functions related to file IO utilized by the primary mobsql app logic.
Package utildownload contains internal helper functions related to file IO utilized by the primary mobsql app logic.
utilfuncs
Package utildownload contains internal helper functions utilized by the primary mobsql app logic.
Package utildownload contains internal helper functions utilized by the primary mobsql app logic.
utilhttp
Package utilhttp contains some internal download functions utilized by the primary mobsql app logic.
Package utilhttp contains some internal download functions utilized by the primary mobsql app logic.
utillog
Package utillog is a simple logging interface with three different types of log messages that can be enabled via the 3 variables LogInfo, LogWarn, and LogDebug.
Package utillog is a simple logging interface with three different types of log messages that can be enabled via the 3 variables LogInfo, LogWarn, and LogDebug.
utilsqlgen
Package utildb contains internal SQL-generation related logic
Package utildb contains internal SQL-generation related logic

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL