fsm

package

v0.3.2 Latest Latest Go to latest Published: Mar 13, 2025 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

codeberg.org/japh/psv

Links

Open Source Insights

Documentation ¶

Overview ¶

fsm provides a finite state machine parser for parsing e.g. rulers

This state machine can

correctly parse matching nested structures
allows for tabular definitions of a language to be parsed

see also: codeberg.org/japh/go-fsmparser

Overview:

var stateMachine = fsm.LoadStateMachine(...)

func (exp *Expression)Parse( input string ) {
    lexer  := fsm.StringLexer( input )
    parser := fsm.Parser(
                        fsm.WithStateMachine( stateMachine ),
                        fsm.WithLexer( lexer ),
                        )
    for parser.Scan() {
        token := parser.Next()
        switch token.Name {
            ...
        }
    }
}

State Transition Tables:

State transitions are defined as tupels of information for each possible transition. Each transition describes an edge in a graph, in which each state is represented by a graph node and each transition is an edge between two nodes.

At the very minimum, each state transition must provide:

an source state
a symbol to match in the input stream (may be the empty string)
the name of a token to emit when the symbol is found
a target state

Which might look something like this:

| state           | input    | token      | next          | notes                                                                     |
| --------------- | -------- | ---------- | ------------- | ------------------------------------------------------------------------- |
| want_expression |          | start_exp  | want_term     | initial state                                                             |
| want_term       | {number} | term       | want_operator | {number} will be matched via a custom matcher                             |
| want_term       |          | finish_exp | done          | an empty symbol always matches - must be the last transition from a state |
| want_operator   | +        | add        | want_term     | `+` will be matched verbatim                                              |
| want_operator   | -        | subtract   | want_term     |                                                                           |
| done            |          |            |               | final state                                                               |

The first row of the table must be a header consisting of column names, but the actual names you use can be configured to your liking.

The full set of transition fields are:

the source state
a 'slurp' symbol to be consumed before searching for this state's next symbol
a symbol to match against the input
a token to emit when the symbol is found and the trasition is followed
an error token to emit when this transition is triggered
a target state
a state to push on the stack when this transition was triggered
a symbol to pop from the stack, i.e. the transtion for ')' expects the popped symbol to be '('

e.g.:

loader, err := fsm.NewTableLoader(
    //              column id     column name
    fsm.WithColumn( fsm.StateCol, "state"      ),
    fsm.WithColumn( fsm.SlurpCol, "slurp"      ),
    fsm.WithColumn( fsm.MatchCol, "input"      ),
    fsm.WithColumn( fsm.TokenCol, "token"      ),
    fsm.WithColumn( fsm.ErrorCol, "error"      ),
    fsm.WithColumn( fsm.NextCol,  "next state" ),
    fsm.WithColumn( fsm.PopCol,   "pop"        ),
    fsm.WithColumn( fsm.PushCol,  "push"       ),
    ...
)

Index ¶

Variables
type ColumnID
- func (id ColumnID) String() string
type Error
- func (err *Error) Error() string
- func (err *Error) String() string
type Lexer
- func BytesLexer(input []byte) Lexer
- func StringLexer(input string) Lexer
- func WordsLexer(input []string) Lexer
type Location
type Parser
- func (p *Parser) Tokens(l Lexer) iter.Seq2[*Token, *Error]
type Stack
- func (s *Stack) Pop() (*StackFrame, error)
- func (s *Stack) Push(sym Symbol, state StateName, loc string)
type StackFrame
type State
- func (s *State) AddTransition(t Transition)
- func (s *State) String() string
- func (s *State) Transitions() []Transition
type StateMachine
- func LoadStateMachine(opts ...TableLoaderOpt) *StateMachine
- func NewStateMachine() *StateMachine
- func (st *StateMachine) AddTransition(id string, skip bool, state, slurp, push, next StateName, pattern string, ...) (Transition, error)
- func (st *StateMachine) State(name StateName) *State
type StateName
type StringLocation
- func (l *StringLocation) Advance(in string)
- func (l StringLocation) String() string
type Symbol
type SymbolMatcherFunc
- func PrefixMatcher(wantPrefix string) SymbolMatcherFunc
- func RegexpMatcher(pattern string) SymbolMatcherFunc
type TableLoader
- func NewLoader(opts ...TableLoaderOpt) (*TableLoader, error)
- func (l *TableLoader) UnmarshalText(input []byte) error
type TableLoaderOpt
- func FromTableString(input string) TableLoaderOpt
- func WithColumn(id ColumnID, name string) TableLoaderOpt
- func WithInitialState(name string) TableLoaderOpt
- func WithMatcher(keyword string, matcher SymbolMatcherFunc) TableLoaderOpt
- func WithRegexpMatcher(keyword, re string) TableLoaderOpt
type Token
- func (tok *Token) String() string
type Transition
- func (t *Transition) Validate() error
type WordLocation
- func (loc WordLocation) String() string

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	ErrMissingStateTable      = errors.New("missing state table")
	ErrMissingInput           = errors.New("missing input")
	ErrBadState               = errors.New("failed to parse state")
	ErrEmptyStack             = errors.New("empty stack")
	ErrUnmatchedSymbol        = errors.New("unmatched")
	ErrEmptyKeyword           = errors.New("empty keyword")
	ErrDuplicateMatcher       = errors.New("duplicate matcher name")
	ErrDuplicateColumnNames   = errors.New("duplicate column name")
	ErrDuplicateColumnIDs     = errors.New("duplicate column id")
	ErrMissingColumnNames     = errors.New("missing required name for column")
	ErrUnmappedColumnName     = errors.New("unmapped column name")
	ErrMissingRequiredColumns = errors.New("missing required column")
	ErrUnmarshallStateTable   = errors.New("failed to unmarshal state table")
	ErrUnnamedState           = errors.New("transitions from an unnamed state are not supported")
	ErrPushAndPop             = errors.New("transitions with push and pull are not supported")
)

Functions ¶

This section is empty.

Types ¶

type ColumnID ¶

type ColumnID uint8

const (
	StateCol ColumnID = iota // current state
	SlurpCol                 // a state to 'slurp' before processing the current state
	MatchCol                 // the input to match
	TokenCol                 // the name of the token to emit when triggered
	ErrorCol                 // the description of an error to emit when triggered
	PushCol                  // the next state to return to after the next transition has terminated
	PopCol                   // the symbol which was used to start the popped state
	NextCol                  // the next state to transition to
	SkipCol                  // transition will be ignored if this is not ""
)

func (ColumnID) String ¶

func (id ColumnID) String() string

type Error ¶

type Error struct {
	Message error
	Symbol
	Location string
}

Error represents a parsing error

func (*Error) Error ¶

func (err *Error) Error() string

Error returns just the error message, without symbol or location information

fulfil error interface

func (*Error) String ¶

func (err *Error) String() string

String returns a fully formatted string containing the error message, parsed symbol (if known) and the location of the error in the input.

type Lexer ¶

type Lexer interface {
	Peek() []byte     // returns the upcoming bytes
	Advance(int)      // skips over a number of bytes, or a single token
	Consume() Symbol  // consumes everything peeded and advanced since the last Consume
	Location() string // returns the lexer's current position in its input
	String() string   // returns the position and string at the lexers current position in its input
}

Lexer provides a cursor over an input

The lexer used for fsm is a "ratchet"

the current position in the input text can be advanced by any amount
symbols can be consumed after multiple advances
it is not possible to move the cursor backwards
lexer.Peek() returns the text starting at the current cursor position (pos below)
lexer.Consume() returns the text between start and pos, and then sets start = pos

e.g.

        parsed symbols         next symbol         unparsed text
input: [......................|...........|.....................]
                              ^           ^
                            start        pos

func BytesLexer ¶

func BytesLexer(input []byte) Lexer

BytesLexer provides a lexical scanner for []byte input

see fsm.Lexer

func StringLexer ¶

func StringLexer(input string) Lexer

StringLexer provides a lexical scanner for string input

see fsm.Lexer

func WordsLexer ¶

func WordsLexer(input []string) Lexer

wordsLexer scans a pre-split slice of words.

Differences to BytesLexer and StringLexer:

Peek() only returns the next word
Consume() is equivalent to Peek() and Advance(1)
Advance() has no effect

see fsm.Lexer

type Location ¶

type Location interface {
	String() string
}

type Parser ¶

type Parser struct {
	*StateMachine // read only, required
}

Parser generates a stream of tokens for the caller.

Note: Parser is not safe for concurrent use!

func (*Parser) Tokens ¶

func (p *Parser) Tokens(l Lexer) iter.Seq2[*Token, *Error]

type Stack ¶

type Stack struct {
	// contains filtered or unexported fields
}

Stack is used to track nested structures while parsing.

For example, any pair of matching tokens, such as braces or quotes.

func (*Stack) Pop ¶

func (s *Stack) Pop() (*StackFrame, error)

func (*Stack) Push ¶

func (s *Stack) Push(sym Symbol, state StateName, loc string)

type StackFrame ¶

type StackFrame struct {
	Symbol
	StateName
	Location string
}

type State ¶

type State struct {
	Name StateName
	// contains filtered or unexported fields
}

func (*State) AddTransition ¶

func (s *State) AddTransition(t Transition)

func (*State) String ¶

func (s *State) String() string

func (*State) Transitions ¶

func (s *State) Transitions() []Transition

type StateMachine ¶

type StateMachine struct {
	InitialState StateName
	// contains filtered or unexported fields
}

func LoadStateMachine ¶

func LoadStateMachine(opts ...TableLoaderOpt) *StateMachine

LoadStateMachine reads a table of state transitions into a machine which can be used by the parser to parse the language specified via the state transitions.

var stateMachine = fsm.LoadStateMachine(

	// where to get the state transitions
	fsm.FromTableString(stateTransitions),

	// column definitions of stateTransitions table
	fsm.WithColumn(fsm.StateCol, "state"),
	fsm.WithColumn(fsm.MatchCol, "input"),
	fsm.WithColumn(fsm.TokenCol, "token"),
	fsm.WithColumn(fsm.NextCol, "next state"),
	fsm.WithColumn(fsm.ErrorCol, "error"),

	// special symbol matchers
	fsm.WithMatcher("{vertical}", fsm.RegexpMatcher(`[|]`)),
	fsm.WithMatcher("{horizontal}", fsm.RegexpMatcher(`[-=]+`)),
	fsm.WithMatcher("{space}", fsm.RegexpMatcher(`\s+`)),
	fsm.WithMatcher("{sort}", fsm.RegexpMatcher(`[\^v]`)), // ^ or v
	fsm.WithMatcher("{prio}", fsm.RegexpMatcher(`[0-7]`)), // 0 ignored

	// initial state
	fsm.WithInitialState("start"),
)

// prepare loader l := NewTableLoader(

WithColumn(fsm.StateCol, "a"),
WithColumn(fsm.MatchCol, "b"),
WithColumn(fsm.TokenCol, "c"),
WithColumn(fsm.ErrorCol, "d"),
WithColumn(fsm.NextCol, "e"),
WithMatcher("{bar}", fsm.RegexMatcher(`bar+`)),

)

// load state machine from table: // | a | b | c | d | e | // | -- | ----- | --------- | - | -- | // | s0 | start | start tok | | s1 | // : : : : : :

if err := l.UnmarshalText([]byte(table)); err != nil {
     return err
}

// parse some input p := NewParser(l.StateTable) p.Parse(input)

REFACTOR: 2025-03-08 add error return

func NewStateMachine ¶

func NewStateMachine() *StateMachine

func (*StateMachine) AddTransition ¶

func (st *StateMachine) AddTransition(
	id string,
	skip bool,
	state, slurp, push, next StateName,
	pattern string,
	matcher SymbolMatcherFunc,
	popSymbol Symbol,
	token, err string) (Transition, error)

func (*StateMachine) State ¶

func (st *StateMachine) State(name StateName) *State

type StateName ¶

type StateName string // name of a state

const FINAL_STATE StateName = ""

type StringLocation ¶

type StringLocation struct {
	Line   int
	Column int
}

StringLocation tracks the line and column number of the lexer's cursor within its input text

func (*StringLocation) Advance ¶

func (l *StringLocation) Advance(in string)

func (StringLocation) String ¶

func (l StringLocation) String() string

type Symbol ¶

type Symbol string // text that was parsed into a token

type SymbolMatcherFunc ¶

type SymbolMatcherFunc func([]byte) (Symbol, bool)

SymbolMatcherFunc is used by the lexical analyser to identify individual symbols in the input text.

- an empty symbol may be successful if the state transition does not require a symbol

e.g. start -> "" -> want_first_symbol

Below, several commonly used SymbolMatcherFunc's are defined for actual use by the lexer

func PrefixMatcher ¶

func PrefixMatcher(wantPrefix string) SymbolMatcherFunc

PrefixMatcher returns a SymbolMatcherFunc that matches a string at the lexer's current position in the input text.

see fsm.RegexpMatcher to match regular expressions
text is matched case-insensitively
Warning: if the last character to be matched is a letter, then the prefix must end at a word boundary

func RegexpMatcher ¶

func RegexpMatcher(pattern string) SymbolMatcherFunc

RegexpMatcher returns a SymbolMatcherFunc that will match the provided regular expression at the current position in the input text.

Note: prepend "(?i)" to the pattern to enable case-insensitivity.

type TableLoader ¶

type TableLoader struct {
	*StateMachine
	// contains filtered or unexported fields
}

func NewLoader ¶

func NewLoader(opts ...TableLoaderOpt) (*TableLoader, error)

func (*TableLoader) UnmarshalText ¶

func (l *TableLoader) UnmarshalText(input []byte) error

UnmarshalText implements the encoding.TextUnmarshaller interface

type TableLoaderOpt ¶

type TableLoaderOpt func(l *TableLoader) error

func FromTableString ¶

func FromTableString(input string) TableLoaderOpt

func WithColumn ¶

func WithColumn(id ColumnID, name string) TableLoaderOpt

func WithInitialState ¶

func WithInitialState(name string) TableLoaderOpt

func WithMatcher ¶

func WithMatcher(keyword string, matcher SymbolMatcherFunc) TableLoaderOpt

func WithRegexpMatcher ¶

func WithRegexpMatcher(keyword, re string) TableLoaderOpt

type Token ¶

type Token struct {
	Name string
	Symbol
	Location string
}

Token represents a correctly parsed part of the input.

func (*Token) String ¶

func (tok *Token) String() string

String returns a fully formatted string containing the token's name (type), its parsed symbol (if known) and location where the token was found in the input.

type Transition ¶

type Transition struct {
	ID      string            // unique state transition identifier
	Skip    bool              // ignore this transition if skip is true
	State   StateName         // originating state (only for debugging)
	Slurp   StateName         // state to "slurp" unnecessary data prior to this state's symbol
	Pattern string            // ???
	Match   SymbolMatcherFunc // static symbol that triggers this transition
	Token   string            // token name to emit when this transition triggers
	Error   string            // error message to emit when this transition triggers
	Push    StateName         // state to push on the stack when this transition triggers (to be popped by another transition later)
	Pop     Symbol            // symbol to match against the top of the stack
	Next    StateName         // next state to transition to when this transition triggers
}

func (*Transition) Validate ¶

func (t *Transition) Validate() error

type WordLocation ¶

type WordLocation int

WordLocation tracks the lexer's index into its slice of words to be parsed

func (WordLocation) String ¶

func (loc WordLocation) String() string

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL