Documentation
¶
Overview ¶
fsm provides a finite state machine parser for parsing e.g. rulers
This state machine can
correctly parse matching nested structures
allows for tabular definitions of a language to be parsed
see also: codeberg.org/japh/go-fsmparser
Overview:
var stateMachine = fsm.LoadStateMachine(...) func (exp *Expression)Parse( input string ) { lexer := fsm.StringLexer( input ) parser := fsm.Parser( fsm.WithStateMachine( stateMachine ), fsm.WithLexer( lexer ), ) for parser.Scan() { token := parser.Next() switch token.Name { ... } } }
State Transition Tables:
State transitions are defined as tupels of information for each possible transition. Each transition describes an edge in a graph, in which each state is represented by a graph node and each transition is an edge between two nodes.
At the very minimum, each state transition must provide:
- an source state
- a symbol to match in the input stream (may be the empty string)
- the name of a token to emit when the symbol is found
- a target state
Which might look something like this:
| state | input | token | next | notes | | --------------- | -------- | ---------- | ------------- | ------------------------------------------------------------------------- | | want_expression | | start_exp | want_term | initial state | | want_term | {number} | term | want_operator | {number} will be matched via a custom matcher | | want_term | | finish_exp | done | an empty symbol always matches - must be the last transition from a state | | want_operator | + | add | want_term | `+` will be matched verbatim | | want_operator | - | subtract | want_term | | | done | | | | final state |
The first row of the table must be a header consisting of column names, but the actual names you use can be configured to your liking.
The full set of transition fields are:
- the source state
- a 'slurp' symbol to be consumed before searching for this state's next symbol
- a symbol to match against the input
- a token to emit when the symbol is found and the trasition is followed
- an error token to emit when this transition is triggered
- a target state
- a state to push on the stack when this transition was triggered
- a symbol to pop from the stack, i.e. the transtion for ')' expects the popped symbol to be '('
e.g.:
loader, err := fsm.NewTableLoader( // column id column name fsm.WithColumn( fsm.StateCol, "state" ), fsm.WithColumn( fsm.SlurpCol, "slurp" ), fsm.WithColumn( fsm.MatchCol, "input" ), fsm.WithColumn( fsm.TokenCol, "token" ), fsm.WithColumn( fsm.ErrorCol, "error" ), fsm.WithColumn( fsm.NextCol, "next state" ), fsm.WithColumn( fsm.PopCol, "pop" ), fsm.WithColumn( fsm.PushCol, "push" ), ... )
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ( ErrMissingStateTable = errors.New("missing state table") ErrMissingInput = errors.New("missing input") ErrBadState = errors.New("failed to parse state") ErrEmptyStack = errors.New("empty stack") ErrUnmatchedSymbol = errors.New("unmatched") ErrEmptyKeyword = errors.New("empty keyword") ErrDuplicateMatcher = errors.New("duplicate matcher name") ErrDuplicateColumnNames = errors.New("duplicate column name") ErrDuplicateColumnIDs = errors.New("duplicate column id") ErrMissingColumnNames = errors.New("missing required name for column") ErrUnmappedColumnName = errors.New("unmapped column name") ErrMissingRequiredColumns = errors.New("missing required column") ErrUnmarshallStateTable = errors.New("failed to unmarshal state table") ErrUnnamedState = errors.New("transitions from an unnamed state are not supported") ErrPushAndPop = errors.New("transitions with push and pull are not supported") )
Functions ¶
This section is empty.
Types ¶
type ColumnID ¶
type ColumnID uint8
const ( StateCol ColumnID = iota // current state SlurpCol // a state to 'slurp' before processing the current state MatchCol // the input to match TokenCol // the name of the token to emit when triggered ErrorCol // the description of an error to emit when triggered PushCol // the next state to return to after the next transition has terminated PopCol // the symbol which was used to start the popped state NextCol // the next state to transition to SkipCol // transition will be ignored if this is not "" )
type Error ¶
Error represents a parsing error
type Lexer ¶
type Lexer interface { Peek() []byte // returns the upcoming bytes Advance(int) // skips over a number of bytes, or a single token Consume() Symbol // consumes everything peeded and advanced since the last Consume Location() string // returns the lexer's current position in its input String() string // returns the position and string at the lexers current position in its input }
Lexer provides a cursor over an input
The lexer used for fsm is a "ratchet"
- the current position in the input text can be advanced by any amount
- symbols can be consumed after multiple advances
- it is not possible to move the cursor backwards
- lexer.Peek() returns the text starting at the current cursor position (pos below)
- lexer.Consume() returns the text between start and pos, and then sets start = pos
e.g.
parsed symbols next symbol unparsed text input: [......................|...........|.....................] ^ ^ start pos
func WordsLexer ¶
wordsLexer scans a pre-split slice of words.
Differences to BytesLexer and StringLexer:
- Peek() only returns the next word
- Consume() is equivalent to Peek() and Advance(1)
- Advance() has no effect
see fsm.Lexer
type Parser ¶
type Parser struct {
*StateMachine // read only, required
}
Parser generates a stream of tokens for the caller.
Note: Parser is not safe for concurrent use!
type Stack ¶
type Stack struct {
// contains filtered or unexported fields
}
Stack is used to track nested structures while parsing.
For example, any pair of matching tokens, such as braces or quotes.
func (*Stack) Pop ¶
func (s *Stack) Pop() (*StackFrame, error)
type StackFrame ¶
type State ¶
type State struct { Name StateName // contains filtered or unexported fields }
func (*State) AddTransition ¶
func (s *State) AddTransition(t Transition)
func (*State) Transitions ¶
func (s *State) Transitions() []Transition
type StateMachine ¶
type StateMachine struct { InitialState StateName // contains filtered or unexported fields }
func LoadStateMachine ¶
func LoadStateMachine(opts ...TableLoaderOpt) *StateMachine
LoadStateMachine reads a table of state transitions into a machine which can be used by the parser to parse the language specified via the state transitions.
var stateMachine = fsm.LoadStateMachine( // where to get the state transitions fsm.FromTableString(stateTransitions), // column definitions of stateTransitions table fsm.WithColumn(fsm.StateCol, "state"), fsm.WithColumn(fsm.MatchCol, "input"), fsm.WithColumn(fsm.TokenCol, "token"), fsm.WithColumn(fsm.NextCol, "next state"), fsm.WithColumn(fsm.ErrorCol, "error"), // special symbol matchers fsm.WithMatcher("{vertical}", fsm.RegexpMatcher(`[|]`)), fsm.WithMatcher("{horizontal}", fsm.RegexpMatcher(`[-=]+`)), fsm.WithMatcher("{space}", fsm.RegexpMatcher(`\s+`)), fsm.WithMatcher("{sort}", fsm.RegexpMatcher(`[\^v]`)), // ^ or v fsm.WithMatcher("{prio}", fsm.RegexpMatcher(`[0-7]`)), // 0 ignored // initial state fsm.WithInitialState("start"), )
// prepare loader l := NewTableLoader(
WithColumn(fsm.StateCol, "a"), WithColumn(fsm.MatchCol, "b"), WithColumn(fsm.TokenCol, "c"), WithColumn(fsm.ErrorCol, "d"), WithColumn(fsm.NextCol, "e"), WithMatcher("{bar}", fsm.RegexMatcher(`bar+`)),
)
// load state machine from table: // | a | b | c | d | e | // | -- | ----- | --------- | - | -- | // | s0 | start | start tok | | s1 | // : : : : : :
if err := l.UnmarshalText([]byte(table)); err != nil { return err }
// parse some input p := NewParser(l.StateTable) p.Parse(input)
REFACTOR: 2025-03-08 add error return
func NewStateMachine ¶
func NewStateMachine() *StateMachine
func (*StateMachine) AddTransition ¶
func (st *StateMachine) AddTransition( id string, skip bool, state, slurp, push, next StateName, pattern string, matcher SymbolMatcherFunc, popSymbol Symbol, token, err string) (Transition, error)
func (*StateMachine) State ¶
func (st *StateMachine) State(name StateName) *State
type StringLocation ¶
StringLocation tracks the line and column number of the lexer's cursor within its input text
func (*StringLocation) Advance ¶
func (l *StringLocation) Advance(in string)
func (StringLocation) String ¶
func (l StringLocation) String() string
type SymbolMatcherFunc ¶
SymbolMatcherFunc is used by the lexical analyser to identify individual symbols in the input text.
- an empty symbol may be successful if the state transition does not require a symbol
- e.g. start -> "" -> want_first_symbol
Below, several commonly used SymbolMatcherFunc's are defined for actual use by the lexer
func PrefixMatcher ¶
func PrefixMatcher(wantPrefix string) SymbolMatcherFunc
PrefixMatcher returns a SymbolMatcherFunc that matches a string at the lexer's current position in the input text.
see fsm.RegexpMatcher to match regular expressions
text is matched case-insensitively
Warning: if the last character to be matched is a letter, then the prefix must end at a word boundary
func RegexpMatcher ¶
func RegexpMatcher(pattern string) SymbolMatcherFunc
RegexpMatcher returns a SymbolMatcherFunc that will match the provided regular expression at the current position in the input text.
Note: prepend "(?i)" to the pattern to enable case-insensitivity.
type TableLoader ¶
type TableLoader struct { *StateMachine // contains filtered or unexported fields }
func NewLoader ¶
func NewLoader(opts ...TableLoaderOpt) (*TableLoader, error)
func (*TableLoader) UnmarshalText ¶
func (l *TableLoader) UnmarshalText(input []byte) error
UnmarshalText implements the encoding.TextUnmarshaller interface
type TableLoaderOpt ¶
type TableLoaderOpt func(l *TableLoader) error
func FromTableString ¶
func FromTableString(input string) TableLoaderOpt
func WithColumn ¶
func WithColumn(id ColumnID, name string) TableLoaderOpt
func WithInitialState ¶
func WithInitialState(name string) TableLoaderOpt
func WithMatcher ¶
func WithMatcher(keyword string, matcher SymbolMatcherFunc) TableLoaderOpt
func WithRegexpMatcher ¶
func WithRegexpMatcher(keyword, re string) TableLoaderOpt
type Transition ¶
type Transition struct { ID string // unique state transition identifier Skip bool // ignore this transition if skip is true State StateName // originating state (only for debugging) Slurp StateName // state to "slurp" unnecessary data prior to this state's symbol Pattern string // ??? Match SymbolMatcherFunc // static symbol that triggers this transition Token string // token name to emit when this transition triggers Error string // error message to emit when this transition triggers Push StateName // state to push on the stack when this transition triggers (to be popped by another transition later) Pop Symbol // symbol to match against the top of the stack Next StateName // next state to transition to when this transition triggers }
func (*Transition) Validate ¶
func (t *Transition) Validate() error
type WordLocation ¶
type WordLocation int
WordLocation tracks the lexer's index into its slice of words to be parsed
func (WordLocation) String ¶
func (loc WordLocation) String() string