fsm

package
v0.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 13, 2025 License: MIT Imports: 6 Imported by: 0

Documentation

Overview

fsm provides a finite state machine parser for parsing e.g. rulers

This state machine can

  • correctly parse matching nested structures

  • allows for tabular definitions of a language to be parsed

see also: codeberg.org/japh/go-fsmparser

Overview:

var stateMachine = fsm.LoadStateMachine(...)

func (exp *Expression)Parse( input string ) {
    lexer  := fsm.StringLexer( input )
    parser := fsm.Parser(
                        fsm.WithStateMachine( stateMachine ),
                        fsm.WithLexer( lexer ),
                        )
    for parser.Scan() {
        token := parser.Next()
        switch token.Name {
            ...
        }
    }
}

State Transition Tables:

State transitions are defined as tupels of information for each possible transition. Each transition describes an edge in a graph, in which each state is represented by a graph node and each transition is an edge between two nodes.

At the very minimum, each state transition must provide:

  • an source state
  • a symbol to match in the input stream (may be the empty string)
  • the name of a token to emit when the symbol is found
  • a target state

Which might look something like this:

| state           | input    | token      | next          | notes                                                                     |
| --------------- | -------- | ---------- | ------------- | ------------------------------------------------------------------------- |
| want_expression |          | start_exp  | want_term     | initial state                                                             |
| want_term       | {number} | term       | want_operator | {number} will be matched via a custom matcher                             |
| want_term       |          | finish_exp | done          | an empty symbol always matches - must be the last transition from a state |
| want_operator   | +        | add        | want_term     | `+` will be matched verbatim                                              |
| want_operator   | -        | subtract   | want_term     |                                                                           |
| done            |          |            |               | final state                                                               |

The first row of the table must be a header consisting of column names, but the actual names you use can be configured to your liking.

The full set of transition fields are:

  • the source state
  • a 'slurp' symbol to be consumed before searching for this state's next symbol
  • a symbol to match against the input
  • a token to emit when the symbol is found and the trasition is followed
  • an error token to emit when this transition is triggered
  • a target state
  • a state to push on the stack when this transition was triggered
  • a symbol to pop from the stack, i.e. the transtion for ')' expects the popped symbol to be '('

e.g.:

loader, err := fsm.NewTableLoader(
    //              column id     column name
    fsm.WithColumn( fsm.StateCol, "state"      ),
    fsm.WithColumn( fsm.SlurpCol, "slurp"      ),
    fsm.WithColumn( fsm.MatchCol, "input"      ),
    fsm.WithColumn( fsm.TokenCol, "token"      ),
    fsm.WithColumn( fsm.ErrorCol, "error"      ),
    fsm.WithColumn( fsm.NextCol,  "next state" ),
    fsm.WithColumn( fsm.PopCol,   "pop"        ),
    fsm.WithColumn( fsm.PushCol,  "push"       ),
    ...
)

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrMissingStateTable      = errors.New("missing state table")
	ErrMissingInput           = errors.New("missing input")
	ErrBadState               = errors.New("failed to parse state")
	ErrEmptyStack             = errors.New("empty stack")
	ErrUnmatchedSymbol        = errors.New("unmatched")
	ErrEmptyKeyword           = errors.New("empty keyword")
	ErrDuplicateMatcher       = errors.New("duplicate matcher name")
	ErrDuplicateColumnNames   = errors.New("duplicate column name")
	ErrDuplicateColumnIDs     = errors.New("duplicate column id")
	ErrMissingColumnNames     = errors.New("missing required name for column")
	ErrUnmappedColumnName     = errors.New("unmapped column name")
	ErrMissingRequiredColumns = errors.New("missing required column")
	ErrUnmarshallStateTable   = errors.New("failed to unmarshal state table")
	ErrUnnamedState           = errors.New("transitions from an unnamed state are not supported")
	ErrPushAndPop             = errors.New("transitions with push and pull are not supported")
)

Functions

This section is empty.

Types

type ColumnID

type ColumnID uint8
const (
	StateCol ColumnID = iota // current state
	SlurpCol                 // a state to 'slurp' before processing the current state
	MatchCol                 // the input to match
	TokenCol                 // the name of the token to emit when triggered
	ErrorCol                 // the description of an error to emit when triggered
	PushCol                  // the next state to return to after the next transition has terminated
	PopCol                   // the symbol which was used to start the popped state
	NextCol                  // the next state to transition to
	SkipCol                  // transition will be ignored if this is not ""
)

func (ColumnID) String

func (id ColumnID) String() string

type Error

type Error struct {
	Message error
	Symbol
	Location string
}

Error represents a parsing error

func (*Error) Error

func (err *Error) Error() string

Error returns just the error message, without symbol or location information

fulfil error interface

func (*Error) String

func (err *Error) String() string

String returns a fully formatted string containing the error message, parsed symbol (if known) and the location of the error in the input.

type Lexer

type Lexer interface {
	Peek() []byte     // returns the upcoming bytes
	Advance(int)      // skips over a number of bytes, or a single token
	Consume() Symbol  // consumes everything peeded and advanced since the last Consume
	Location() string // returns the lexer's current position in its input
	String() string   // returns the position and string at the lexers current position in its input
}

Lexer provides a cursor over an input

The lexer used for fsm is a "ratchet"

  • the current position in the input text can be advanced by any amount
  • symbols can be consumed after multiple advances
  • it is not possible to move the cursor backwards
  • lexer.Peek() returns the text starting at the current cursor position (pos below)
  • lexer.Consume() returns the text between start and pos, and then sets start = pos

e.g.

        parsed symbols         next symbol         unparsed text
input: [......................|...........|.....................]
                              ^           ^
                            start        pos

func BytesLexer

func BytesLexer(input []byte) Lexer

BytesLexer provides a lexical scanner for []byte input

see fsm.Lexer

func StringLexer

func StringLexer(input string) Lexer

StringLexer provides a lexical scanner for string input

see fsm.Lexer

func WordsLexer

func WordsLexer(input []string) Lexer

wordsLexer scans a pre-split slice of words.

Differences to BytesLexer and StringLexer:

  • Peek() only returns the next word
  • Consume() is equivalent to Peek() and Advance(1)
  • Advance() has no effect

see fsm.Lexer

type Location

type Location interface {
	String() string
}

type Parser

type Parser struct {
	*StateMachine // read only, required
}

Parser generates a stream of tokens for the caller.

Note: Parser is not safe for concurrent use!

func (*Parser) Tokens

func (p *Parser) Tokens(l Lexer) iter.Seq2[*Token, *Error]

type Stack

type Stack struct {
	// contains filtered or unexported fields
}

Stack is used to track nested structures while parsing.

For example, any pair of matching tokens, such as braces or quotes.

func (*Stack) Pop

func (s *Stack) Pop() (*StackFrame, error)

func (*Stack) Push

func (s *Stack) Push(sym Symbol, state StateName, loc string)

type StackFrame

type StackFrame struct {
	Symbol
	StateName
	Location string
}

type State

type State struct {
	Name StateName
	// contains filtered or unexported fields
}

func (*State) AddTransition

func (s *State) AddTransition(t Transition)

func (*State) String

func (s *State) String() string

func (*State) Transitions

func (s *State) Transitions() []Transition

type StateMachine

type StateMachine struct {
	InitialState StateName
	// contains filtered or unexported fields
}

func LoadStateMachine

func LoadStateMachine(opts ...TableLoaderOpt) *StateMachine

LoadStateMachine reads a table of state transitions into a machine which can be used by the parser to parse the language specified via the state transitions.

var stateMachine = fsm.LoadStateMachine(

	// where to get the state transitions
	fsm.FromTableString(stateTransitions),

	// column definitions of stateTransitions table
	fsm.WithColumn(fsm.StateCol, "state"),
	fsm.WithColumn(fsm.MatchCol, "input"),
	fsm.WithColumn(fsm.TokenCol, "token"),
	fsm.WithColumn(fsm.NextCol, "next state"),
	fsm.WithColumn(fsm.ErrorCol, "error"),

	// special symbol matchers
	fsm.WithMatcher("{vertical}", fsm.RegexpMatcher(`[|]`)),
	fsm.WithMatcher("{horizontal}", fsm.RegexpMatcher(`[-=]+`)),
	fsm.WithMatcher("{space}", fsm.RegexpMatcher(`\s+`)),
	fsm.WithMatcher("{sort}", fsm.RegexpMatcher(`[\^v]`)), // ^ or v
	fsm.WithMatcher("{prio}", fsm.RegexpMatcher(`[0-7]`)), // 0 ignored

	// initial state
	fsm.WithInitialState("start"),
)

// prepare loader l := NewTableLoader(

WithColumn(fsm.StateCol, "a"),
WithColumn(fsm.MatchCol, "b"),
WithColumn(fsm.TokenCol, "c"),
WithColumn(fsm.ErrorCol, "d"),
WithColumn(fsm.NextCol, "e"),
WithMatcher("{bar}", fsm.RegexMatcher(`bar+`)),

)

// load state machine from table: // | a | b | c | d | e | // | -- | ----- | --------- | - | -- | // | s0 | start | start tok | | s1 | // : : : : : :

if err := l.UnmarshalText([]byte(table)); err != nil {
     return err
}

// parse some input p := NewParser(l.StateTable) p.Parse(input)

REFACTOR: 2025-03-08 add error return

func NewStateMachine

func NewStateMachine() *StateMachine

func (*StateMachine) AddTransition

func (st *StateMachine) AddTransition(
	id string,
	skip bool,
	state, slurp, push, next StateName,
	pattern string,
	matcher SymbolMatcherFunc,
	popSymbol Symbol,
	token, err string) (Transition, error)

func (*StateMachine) State

func (st *StateMachine) State(name StateName) *State

type StateName

type StateName string // name of a state
const FINAL_STATE StateName = ""

type StringLocation

type StringLocation struct {
	Line   int
	Column int
}

StringLocation tracks the line and column number of the lexer's cursor within its input text

func (*StringLocation) Advance

func (l *StringLocation) Advance(in string)

func (StringLocation) String

func (l StringLocation) String() string

type Symbol

type Symbol string // text that was parsed into a token

type SymbolMatcherFunc

type SymbolMatcherFunc func([]byte) (Symbol, bool)

SymbolMatcherFunc is used by the lexical analyser to identify individual symbols in the input text.

- an empty symbol may be successful if the state transition does not require a symbol

  • e.g. start -> "" -> want_first_symbol

Below, several commonly used SymbolMatcherFunc's are defined for actual use by the lexer

func PrefixMatcher

func PrefixMatcher(wantPrefix string) SymbolMatcherFunc

PrefixMatcher returns a SymbolMatcherFunc that matches a string at the lexer's current position in the input text.

  • see fsm.RegexpMatcher to match regular expressions

  • text is matched case-insensitively

  • Warning: if the last character to be matched is a letter, then the prefix must end at a word boundary

func RegexpMatcher

func RegexpMatcher(pattern string) SymbolMatcherFunc

RegexpMatcher returns a SymbolMatcherFunc that will match the provided regular expression at the current position in the input text.

Note: prepend "(?i)" to the pattern to enable case-insensitivity.

type TableLoader

type TableLoader struct {
	*StateMachine
	// contains filtered or unexported fields
}

func NewLoader

func NewLoader(opts ...TableLoaderOpt) (*TableLoader, error)

func (*TableLoader) UnmarshalText

func (l *TableLoader) UnmarshalText(input []byte) error

UnmarshalText implements the encoding.TextUnmarshaller interface

type TableLoaderOpt

type TableLoaderOpt func(l *TableLoader) error

func FromTableString

func FromTableString(input string) TableLoaderOpt

func WithColumn

func WithColumn(id ColumnID, name string) TableLoaderOpt

func WithInitialState

func WithInitialState(name string) TableLoaderOpt

func WithMatcher

func WithMatcher(keyword string, matcher SymbolMatcherFunc) TableLoaderOpt

func WithRegexpMatcher

func WithRegexpMatcher(keyword, re string) TableLoaderOpt

type Token

type Token struct {
	Name string
	Symbol
	Location string
}

Token represents a correctly parsed part of the input.

func (*Token) String

func (tok *Token) String() string

String returns a fully formatted string containing the token's name (type), its parsed symbol (if known) and location where the token was found in the input.

type Transition

type Transition struct {
	ID      string            // unique state transition identifier
	Skip    bool              // ignore this transition if skip is true
	State   StateName         // originating state (only for debugging)
	Slurp   StateName         // state to "slurp" unnecessary data prior to this state's symbol
	Pattern string            // ???
	Match   SymbolMatcherFunc // static symbol that triggers this transition
	Token   string            // token name to emit when this transition triggers
	Error   string            // error message to emit when this transition triggers
	Push    StateName         // state to push on the stack when this transition triggers (to be popped by another transition later)
	Pop     Symbol            // symbol to match against the top of the stack
	Next    StateName         // next state to transition to when this transition triggers
}

func (*Transition) Validate

func (t *Transition) Validate() error

type WordLocation

type WordLocation int

WordLocation tracks the lexer's index into its slice of words to be parsed

func (WordLocation) String

func (loc WordLocation) String() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL